Open wandgitlabbot opened 3 years ago
In GitLab, by Daniel Oosterwijk on 2020-08-23
There are a couple of ways to do streams with multiple inputs in Flink:
.union()
is currently used in the YamlDagRunner to join the Event streams into a single instance of a sink. It just slaps all the elements of an arbitrary number of streams into a single output stream. The streams must share a common parent, which is easy for us under Measurement, but no additional information is retained and it makes no effort to unify data rate or anything..connect()
produces a ConnectedStreams
, which retains the separation and type information of two (and only two) input streams. These can be processed using CoProcessFunction
and KeyedCoProcessFunction
, but does not appear to support windowing..join()
aims at pairing up elements of two separate streams in a one-to-one fashion. There are a few subtypes of joining, which allow us to specify certain behaviours around timing and how long to wait for a matching element.Since there are several ways to implement multiple-input streams which each have their own benefits, I won't implement any of them in the YamlDagRunner until we have a detector which actually requires the functionality.
In GitLab, by Daniel Oosterwijk on 2020-07-29
The YamlDagRunner's file format allows multiple inputs to be specified for a single source. Flink supports this as well, but the runner itself does not yet.
At commit e61a3187, YamlDagRunner:125 and DetectorInstance:40 refer to the limitation of having a single input type. These should be expanded on when we make detectors with multiple inputs.