takuti / datadog-anomaly-detector

:dog: Anomaly detection system for Datadog multiple metrics
MIT License
20 stars 5 forks source link

Alternative change-point detection algorithms #15

Open takuti opened 8 years ago

takuti commented 8 years ago

Current detection logic ChangeFinder is highly sensitive to the hyperparameters and estimation techniques as discussed on #11 and #14 . In addition, it is unclear if ChangeFinder is theoretically appropriate to our problem.

Thus, additional survey and considering different change-point detection algorithms is also necessary in practice.

takuti commented 8 years ago

One important requirement is that an algorithm has to work well on high-dimensional data points as:

[metric 1, metric 2, metric 3, ...., metric N]

This enable us to detect anomalies from a whole system status. Finding change-points only from some specific metrics are not enough in practice; humans are actually monitoring more comprehensive status of lots more system metrics.

In addition, even if an algorithm can handle high-dimensional data, we'd love to figure out "why is the current status detected as anomaly?". Such information must be easily accessible from users.

Moreover, an algorithm has to be able to run in an on-line (i.e. incremental) scheme.

takuti commented 8 years ago

Implemented Singular Spectrum Transformation (SST) based change-point detector: sst.py

Some experimental results:

Synthetic data

synthetic_sst

Twitter data

twitter_sst_1

^ larger r

twitter_sst_2

^ smaller r

w is a window size, and this has to be smaller enough than intervals of expected change-points. Larger w makes SST computational heavy because it decides size of matrices applied SVD.

r is rank for the orthonormal bases obtained from SVD (i.e. how many principal components are used to consider "past"/"current" subspaces). Larger value means the scores become more sensitive to noisy values, and smaller value shows a kind of smoothing effect on the result.

Importantly, this method requires to do look-ahead windowing; that is, when you want to compute a change-point score at time t, you have to pass some future points e.g. t + 1, t + 2, ...

takuti commented 8 years ago

New outcomes came from Lanczos-based efficient implementation:

Synthetic

Original SST (directly compute SVD of w * w matrix)

[CPU elapsed time] 3.93 sec

sst-synthetic-svd

Lanczos-based impl.

[CPU elapsed time] 2.62 sec

sst-synthetic-lanczos

Twitter

Original SST

[CPU elapsed time] 20.8 sec

sst-twitter-svd

Lanczos-based impl.

[CPU elapsed time] 14.9 sec

sst-twitter-lanczos

Bigger window-size takes advantage of Lanczos-based implementation. If w is small enough, directly computing SVD is efficient enough.