py-why / dowhy

DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions. DoWhy is based on a unified language for causal inference, combining causal graphical models and potential outcomes frameworks.
https://www.pywhy.org/dowhy
MIT License
7.1k stars 934 forks source link

Seasonality and Causal effects in GCM #858

Closed nsankar closed 1 year ago

nsankar commented 1 year ago

@kailashbuki @bloebp Hi, I would like to know if the causal influences (intrinsic causal influences, arrow strengths in GCM) that can be discovered from the data would depend on the seasonality patterns in the data or would it be independent of the seasonal trends? Also Is it a relevant assumption to consider?

--For instance, in securities trading, the trend / seasonality in data varies based on time/day (morning, intraday and delivery etc), We are looking at estimating the causal influences in a cloud/microservices environment running trading applications which will reflect seasonal changes in the data. The question that I have now is should we train GCM on a larger dataset (like last 3 months) that would have captured some of the seasonal patterns and If the causal influences are not dependent on seasonal patterns , Can I use a smaller dataset (last 15 days) ?.

Thanks in advance.

bloebp commented 1 year ago

Hi, that's a great question. Generally, these algorithms assume IID data (i.e., no time dependencies, no hidden confounders). However, if the seasonal effects are relatively weak, these algorithms can still provide some useful results, though they should be treated with caution. Some ideas to mitigate the issue:

If there are no seasonal patterns, smaller datasets may still be useful, depending on the time granularity and size of the graph. However, if the time granularity is one data point per day, then 15 days is likely too small.

nsankar commented 1 year ago

@bloebp Thanks for the inputs