Seasonality and Causal effects in GCM

py-why / dowhy

DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions. DoWhy is based on a unified language for causal inference, combining causal graphical models and potential outcomes frameworks.

MIT License

7.1k stars 934 forks source link

@kailashbuki @bloebp Hi, I would like to know if the causal influences (intrinsic causal influences, arrow strengths in GCM) that can be discovered from the data would depend on the seasonality patterns in the data or would it be independent of the seasonal trends? Also Is it a relevant assumption to consider?

--For instance, in securities trading, the trend / seasonality in data varies based on time/day (morning, intraday and delivery etc), We are looking at estimating the causal influences in a cloud/microservices environment running trading applications which will reflect seasonal changes in the data. The question that I have now is should we train GCM on a larger dataset (like last 3 months) that would have captured some of the seasonal patterns and If the causal influences are not dependent on seasonal patterns , Can I use a smaller dataset (last 15 days) ?.

Thanks in advance.

Hi, that's a great question. Generally, these algorithms assume IID data (i.e., no time dependencies, no hidden confounders). However, if the seasonal effects are relatively weak, these algorithms can still provide some useful results, though they should be treated with caution. Some ideas to mitigate the issue:

Introduce an additional variable representing the time, for instance, a sinus distributed over week, month etc.
Average over a higher time granularity, although some valuable information may be lost
Remove the seasonality from the data by using techniques like a seasonal decomposition
Introduce lag variables, such as adding the feature value from the previous day. In this case, the graph structure can become quite complex and large.

If there are no seasonal patterns, smaller datasets may still be useful, depending on the time granularity and size of the graph. However, if the time granularity is one data point per day, then 15 days is likely too small.

py-why / dowhy

Seasonality and Causal effects in GCM #858