Proper time-series / causal inference on fields

ocdevel / gnothi

Gnothi is an open-source AI journal and toolkit for self-discovery. If you're interested in getting involved, we'd love to hear from you.

https://gnothiai.com

GNU Affero General Public License v3.0

171 stars 19 forks source link

Proper time-series / causal inference on fields #23

Open lefnire opened 4 years ago

lefnire commented 4 years ago

Currently using a so-so correlation setup via XGBRegressor on time-lagged field-entries, 5 day windows. It's pretty bad theory, but works decent in practice. We'll want to move to something more solid for time-series analysis.

Darts (which includes Prophet) (try this one first)

XGBoost (current setup)

Facebook Prophet

Cause/effect models

Survey of cause-effect models, project list.
- uber/causalml (need to figure out "treatment" input variable)
- microsoft/dowhy
- CausalDiscoveryToolbox
To investigate: akelleh/causality, JakeColtman/bartpy, awarebayes/RecNN, M-Nauta/TCDF, flow-forecast

Other

~~lime~~ (this is for general model-explaining)

lefnire commented 3 years ago

Did some digging around the links above. facebook/Prophet's multivariate + feature_importances support is fairly experimental/limited, but worth a shot. There's no aspect of cause/effect in Prophet, it's just a (powerful) time series forecasting model.

But there are real cause-effect models! Looks like Judea Pearl really made a splash. I need to read the paper under "cause/effect models" above and continue investigating those projects. They product DAGs (directed acyclic graphs) which I could walk to each Field's parent to find it's original cause - awesome. What's unclear is which of these is time-series aware, they all seem stationary (comparing fields<->fields within each day). Also, uber/causalml - which seems the most active/powerful, requires "treatments" as an input variable, which I understood to be an output; where microsoft/dowhy is just wildly complex/confusing. So lots of reading to do.

lefnire commented 11 months ago

DoWhy or CausalNex. ChatGPT Convo which shows how to go about this (lots of how-to code in there, it could be a simple weekend project!)

Turns out the current implementation (XGBoost's feature_importances) isn't too far from the mark, given how I'm rolling timeseries, so I'm gonna punt on this since it's halfway decent. But the next step should be a move towards DoWhy or CausalNex. Would be great to not only save the influencer score (importances, like we see now); but also an exported graph.png of the bayesian graph, showing flow from A->B->C (uploaded to S3 for that user).