py-why / dowhy

DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions. DoWhy is based on a unified language for causal inference, combining causal graphical models and potential outcomes frameworks.
https://www.pywhy.org/dowhy
MIT License
6.99k stars 923 forks source link

GCM - contribution in units instead of variance. #982

Closed TimKreienkamp closed 1 year ago

TimKreienkamp commented 1 year ago

This question concerns the "Intrinsic Causal Contribution" function in the GCM module, which usually are expressed in terms of variance contribution, e.g. in this example: https://aws.amazon.com/de/blogs/opensource/root-cause-analysis-with-dowhy-an-open-source-python-library-for-causal-machine-learning/

In some applications, however - e.g. marketing - we are rather interested in a variables total contribution to the target in terms of units instead of variance.

As an example, instead of saying "TV contributed 25% to the variance in sales" we want to make a statement like "(in a given time period in the dataset) TV contributed 2500 units sales (or EUR 50000 in revenue, etc)." I know it is possible to adjust the function used for attribution, but I haven't found an easy way / example to do so. Maybe its so easy that it's obvious for everyone else, but would still appreciate an example here.

Thanks!

Houssem1995 commented 1 year ago

Hi @TimKreienkamp ,

I completely agree with your point about the "Intrinsic Causal Contribution" function in the GCM module. In certain contexts, such as marketing, understanding the total contribution of variables in units rather than just variance percentages would be incredibly valuable. It could provide more actionable insights for decision-making and better align with real-world scenarios.

I'm also eager to see an example or any guidance on how to customize the function for unit-based contributions. Having such a feature would make the GCM module even more powerful and applicable in various industries.

bloebp commented 1 year ago

Hey, you might want to look at the (recently) added function to analyze the ICC of a single sample (instead of the population): https://github.com/py-why/dowhy/blob/main/dowhy/gcm/influence.py#L323 A single sample here could represent a certain time stamp (or aggregate over a time period). By default, it uses the mean difference as the set function. Let me know if this is helpful or if you run into issues.

TimKreienkamp commented 1 year ago

Thanks a lot Patrick! Will try it out and report back here!

github-actions[bot] commented 1 year ago

This issue is stale because it has been open for 14 days with no activity.

github-actions[bot] commented 1 year ago

This issue was closed because it has been inactive for 7 days since being marked as stale.