khundman / telemanom

A framework for using LSTMs to detect anomalies in multivariate time series data. Includes spacecraft anomaly data and experiments from the Mars Science Laboratory and SMAP missions.
https://arxiv.org/abs/1802.04431
Other
1.02k stars 248 forks source link

start and end of anomalies #58

Closed loubnabnl closed 3 years ago

loubnabnl commented 3 years ago

Hello, Thank you for sharing this project. I have some questions regarding the indexes of the outliers. When the anomaly is a point, isn't it supposed to be on one point so why is there a start and end index also for points in anomaly_sequence?

And for collective anomalies, you only consider the start and end indexes as outliers, I wanted to know why you didn't consider taking all the indexes between the start and the end as outliers, especially for short anomalies that represent a temporary change. Thank you in advance.

khundman commented 3 years ago

For your first question, please see "Errors and Smoothing" in section 3.2 in the paper.

For your second question, all indices between the start and end indices are in fact considered outliers, see implementation here.

Dezicon commented 3 years ago

Hello @khundman , thanks for directing us to section 3.2 for point anomalies.

I understand how you calculate the thresholds.

I am using time-series data that has single-value anomalies (point and contextual) rather than collective anomalies. In the code of errors.py file, you combine the sequences that are in different windows but adjacent to each other to form a single sequence (although in this case each of the combined sequences have different maximum scores calculated using different values of epsilon). Can I term the point with the maximum score in a sequence as a single-value anamoly(point or contextual)?

bappctl commented 3 years ago

Have similar question as above any pointers. Thanks

khundman commented 3 years ago

If I understand your question, you can tune the error padding (and subsequent grouping) behavior using the error_buffer parameter here. Shrinking this number will shrink the error sequences around the max score and discourage sequence merging.