Use case for gcm.distribution_change

py-why / dowhy

DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions. DoWhy is based on a unified language for causal inference, combining causal graphical models and potential outcomes frameworks.

https://www.pywhy.org/dowhy

MIT License

7.08k stars 932 forks source link

Use case for gcm.distribution_change #495

Closed htcml closed 2 years ago

htcml commented 2 years ago

Question regarding the rca_microservice_architecture notebook.

Detecting network latency anomaly is a good use case for gcm.distribution_change because we can easily collect many data points as a batch to compare against the baseline distribution.

Obviously gcm.distribution_change can't be used to detect anomaly against "one" data point. For example, for a ML model refresh pipeline, each node is a data operation(rollup, transform, create model...) and a measurement is created for each node per refresh. We want to detect an anomaly for a refresh and don't want to wait until after having, say 100 refresh failures. For this use case, can you recommend an appropriate algorithm for detection?

kailashbuki commented 2 years ago

hi @htcml, the methods like distribution_change in gcm are for explaining the root causes of some observed effect. They are not meant for detecting effect.

If your goal is to detect an outlier (or anomaly), popular ML libraries like scikit-learn already provide various algorithms. Once an outlier is detected, if you would like to identify the root causes of that outlier, we have a method called attribute_anomalies in gcm. Is this what you have in mind?

petergtz commented 2 years ago

To add to @kailashbuki's response, in fact, attribute_anomalies just made it into the rca_microservice_architecture notebook on latest master. There, check out section Scenario 1: Observing a single outlier and specifically Attributing an outlier latency at a target service to other services.