Open takuti opened 8 years ago
One important requirement is that an algorithm has to work well on high-dimensional data points as:
[metric 1, metric 2, metric 3, ...., metric N]
This enable us to detect anomalies from a whole system status. Finding change-points only from some specific metrics are not enough in practice; humans are actually monitoring more comprehensive status of lots more system metrics.
In addition, even if an algorithm can handle high-dimensional data, we'd love to figure out "why is the current status detected as anomaly?". Such information must be easily accessible from users.
Moreover, an algorithm has to be able to run in an on-line (i.e. incremental) scheme.
Implemented Singular Spectrum Transformation (SST) based change-point detector: sst.py
Some experimental results:
^ larger r
^ smaller r
w
is a window size, and this has to be smaller enough than intervals of expected change-points. Larger w
makes SST computational heavy because it decides size of matrices applied SVD.
r
is rank for the orthonormal bases obtained from SVD (i.e. how many principal components are used to consider "past"/"current" subspaces). Larger value means the scores become more sensitive to noisy values, and smaller value shows a kind of smoothing effect on the result.
Importantly, this method requires to do look-ahead windowing; that is, when you want to compute a change-point score at time t, you have to pass some future points e.g. t + 1, t + 2, ...
New outcomes came from Lanczos-based efficient implementation:
Original SST (directly compute SVD of w * w matrix)
[CPU elapsed time] 3.93 sec
Lanczos-based impl.
[CPU elapsed time] 2.62 sec
Original SST
[CPU elapsed time] 20.8 sec
Lanczos-based impl.
[CPU elapsed time] 14.9 sec
Bigger window-size takes advantage of Lanczos-based implementation. If w
is small enough, directly computing SVD is efficient enough.
Current detection logic ChangeFinder is highly sensitive to the hyperparameters and estimation techniques as discussed on #11 and #14 . In addition, it is unclear if ChangeFinder is theoretically appropriate to our problem.
Thus, additional survey and considering different change-point detection algorithms is also necessary in practice.