Closed htcml closed 2 years ago
hi @htcml, the methods like distribution_change
in gcm
are for explaining the root causes of some observed effect. They are not meant for detecting effect.
If your goal is to detect an outlier (or anomaly), popular ML libraries like scikit-learn already provide various algorithms. Once an outlier is detected, if you would like to identify the root causes of that outlier, we have a method called attribute_anomalies
in gcm
. Is this what you have in mind?
To add to @kailashbuki's response, in fact, attribute_anomalies
just made it into the rca_microservice_architecture notebook on latest master. There, check out section Scenario 1: Observing a single outlier and specifically Attributing an outlier latency at a target service to other services.
Question regarding the rca_microservice_architecture notebook.
Detecting network latency anomaly is a good use case for gcm.distribution_change because we can easily collect many data points as a batch to compare against the baseline distribution.
Obviously gcm.distribution_change can't be used to detect anomaly against "one" data point. For example, for a ML model refresh pipeline, each node is a data operation(rollup, transform, create model...) and a measurement is created for each node per refresh. We want to detect an anomaly for a refresh and don't want to wait until after having, say 100 refresh failures. For this use case, can you recommend an appropriate algorithm for detection?