wandnz / streamevmon

Framework and pipeline for time series anomaly detection
GNU General Public License v3.0
1 stars 1 forks source link

Out of order data behaviour investigation #12

Closed wandgitlabbot closed 3 years ago

wandgitlabbot commented 3 years ago

In GitLab, by Daniel Oosterwijk on 2019-12-08

Flink allows data to arrive out of order. Some detectors may have stronger order requirements than others, which might need us to adjust parameters like flink.maxLateness. If there is a detector with a strong order requirement, our sink functions (like InfluxSink) might need to have some way to be signalled that a previously submitted event is now invalid. We don't really know a lot about what will happen if data does arrive out of order, but chances are it won't be an issue at all considering the usual magnitude of intervals between AMP measurements.

wandgitlabbot commented 3 years ago

In GitLab, by Daniel Oosterwijk on 2020-01-28

If a windowed operator is not used, out of order data is merely processed as though it was new, despite its older timestamp.

A class which wraps any KeyedProcessFunction in a ProcessWindowFunction in order to correctly order any elements which arrives during that window is a valid approach. Combined with maxLateness, this could allow us to process all elements in the correct order at the cost of some latency.

Some detectors, such as the Mode Detector, don't keep additional state on top of a list of recent measurements. These could be implemented natively as window functions, meaning they can also take advantage of the sliding window approach in order to definitely not miss any events. This would require some event de-duplication further down the stream, but that's probably not a bad idea for all detectors since it would let us stop reusing code which prevents events being emitted too often.

It could be feasible to determine or infer the frequency of each test stream, and create a custom windower or buffering process function which will wait events have been received in order. This would likely want a timeout in case an event is never found, since we don't want to stall forever. We could resort to using an InfluxHistoryConnection to see if we can retrieve missed measurements if they are lost in transit from the collector to us.

wandgitlabbot commented 3 years ago

In GitLab, by Daniel Oosterwijk on 2020-01-28

c53c192d implements the WindowedFunctionWrapper, which appears to function correctly for the first idea.

95f940e3 implements a windowed form of the DistDiffDetector, which functions identically but uses a keyed windowed stream. This is unfortunately not practical for all detectors, but it does simplify the implementation of those that can be implemented in this fashion.

wandgitlabbot commented 3 years ago

In GitLab, by Daniel Oosterwijk on 2020-01-29

bc68ec0c uses the WindowedFunctionWrapper for all non-windowable detectors in the UnifiedRunner, optionally via configuration. Currently this is not configurable by detector unless that detector has a windowed form. This should solve most of the out-of-order issues at the cost of some latency given the correct values of detector.default.windowDuration and flink.maxLateness. I'll leave this issue open in case further developments arise.

wandgitlabbot commented 3 years ago

In GitLab, by Daniel Oosterwijk on 2020-01-29

We should test if these new windowed operators serialise correctly! The wrapper especially does not include its contained process function as a ValueState, so it's uncertain whether state is restored correctly.

wandgitlabbot commented 3 years ago

In GitLab, by Daniel Oosterwijk on 2020-02-05

625e9c7b adds tests to show that the operators do serialise properly. Closing the issue until further notice.