py-why / dowhy

DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions. DoWhy is based on a unified language for causal inference, combining causal graphical models and potential outcomes frameworks.
https://www.pywhy.org/dowhy
MIT License
7k stars 923 forks source link

Algorithms for efficient adjustment sets #464

Open esmucler opened 2 years ago

esmucler commented 2 years ago

Hello DoWhy team.

Congrats on the great work on this package! I wonder if you would be interested in a contribution to the package. First, a brief intro.

In a series of papers with co-authors (1, 2, and 3, the last one currently under review in the Journal of Causal Inference), we have developed theory and algorithms to compute efficient (meaning low variance) adjustment sets for estimating the average treatment effect of a treatment on an outcome under a non-parametric causal graphical model. Our results allow for hidden variables in the graph (as long as at least one adjustment set is comprised of observable variables), and the possibility of individualised treatments (in which the values of the intervention variable depend on some other set of variables).

More precisely, suppose we are given a causal graph G specifying:

Suppose moreover that there exists at least one adjustment set with respect to A and Y in G that is comprised of observable variables. Consider the following definitions:

Under these assumptions, we have shown that optimal minimal and optimal minimum cost adjustment sets always exist, and can be computed in polynomial time. We also provide a sufficient criterion for the existence of an optimal adjustment set and a polynomial time algorithm to compute it when it exists.

These results are not only valid for non-parametric graphs and estimators, but also by virtue of results in this paper, for linear structural equation models and OLS estimators.

We have implemented these algorithms in the optimaladj package, with routines from networkx doing most of the algorithmic heavy lifting. We believe they would be a nice addition to the DoWhy package.

Going back to my first point. Would you be interested in a PR that incorporates these algorithms into DoWhy? They could supplement the already implemented backdoor identification strategy.

Best, Ezequiel Smucler

amit-sharma commented 2 years ago

Hey, @esmucler thanks for reaching out. I'm familiar with your work and I think it will be a great addition to DoWhy.

Thinking in terms of user-facing API, here's a proposal: the identify_effect method already contains options to constrain the adjustment set returned. We have "exhaustive", "minimal-adjustment", "maximal-adjustment" and a "default" that is a heuristic mix of minimal and maximum. One option is to add more options here for the user, which may include, "optimal-adjustment", "optimal-minimal-adjustment" and "optimal-minimum-cost-adjustment". Would something like this work?

In terms of code structure, it may be easiest to add a new class for your method under causal_identifiers folder. Then you may need to modify identify_ate_effect method in CausalIdentifier class to add a call to your class.

In any case, feel free to raise a PR. You may also consider submitting a draft (incomplete) PR so that we can review the code structure and API before all the detailed code is added.

emrekiciman commented 2 years ago

+1, I agree that this would be a great addition, Ezequiel!

Amit, where in the dowhy API would we add the costs associated with observed variables? Would we embed them within the graph structure, or add them as an extra argument in the identify_effect method? Any thoughts about what seems more natural, Ezequiel?

From: Amit Sharma @.> Sent: Sunday, June 12, 2022 9:52 PM To: py-why/dowhy @.> Cc: Subscribed @.***> Subject: Re: [py-why/dowhy] Algorithms for efficient adjustment sets (Issue #464)

Hey, @esmuclerhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fesmucler&data=05%7C01%7Cemrek%40microsoft.com%7C0df55eab3fe241ab1ab708da4cf87ea4%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637906927407941926%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=%2FCJn775yQSYMiqttndpEjwAmxVKa%2Bwi5u2HoLhZC6mM%3D&reserved=0 thanks for reaching out. I'm familiar with your work and I think it will be a great addition to DoWhy.

Thinking in terms of user-facing API, here's a proposal: the identify_effect method already contains options to constrain the adjustment set returned. We have "exhaustive", "minimal-adjustment", "maximal-adjustment" and a "default" that is a heuristic mix of minimal and maximum. One option is to add more options here for the user, which may include, "optimal-adjustment", "optimal-minimal-adjustment" and "optimal-minimum-cost-adjustment". Would something like this work?

In terms of code structure, it may be easiest to add a new class for your method under causal_identifiers folder. Then you may need to modify identify_ate_effect method in CausalIdentifier class to add a call to your class.

In any case, feel free to raise a PR. You may also consider submitting a draft (incomplete) PR so that we can review the code structure and API before all the detailed code is added.

- Reply to this email directly, view it on GitHubhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fpy-why%2Fdowhy%2Fissues%2F464%23issuecomment-1153469533&data=05%7C01%7Cemrek%40microsoft.com%7C0df55eab3fe241ab1ab708da4cf87ea4%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637906927407941926%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=gKVZvv%2BLu%2FPE83CkpKoFvQcZuB%2BfuXoOifu90KPcoz4%3D&reserved=0, or unsubscribehttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABNUPUBLIPXNEJUCVDZ5ZQ3VO25AFANCNFSM5YP5I6ZQ&data=05%7C01%7Cemrek%40microsoft.com%7C0df55eab3fe241ab1ab708da4cf87ea4%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637906927407941926%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=uMUGtBmmCfUdC7%2FZEI%2B%2BeJ3vVQnpoT7atgrsMxD6kO0%3D&reserved=0. You are receiving this because you are subscribed to this thread.Message ID: @.**@.>>

esmucler commented 2 years ago

Hi folks. Thanks for the quick reply and for your suggestions. The proposal to add options to identify_effect makes perfect sense to me, as does adding an extra argument to pass the costs. We will start working on this and once we have something decent we will submit a draft PR to get feedback.

emrekiciman commented 2 years ago

Wonderful, looking forward! Thank you!

From: Ezequiel Smucler @.> Sent: Monday, June 13, 2022 10:56 AM To: py-why/dowhy @.> Cc: Emre Kiciman @.>; Comment @.> Subject: Re: [py-why/dowhy] Algorithms for efficient adjustment sets (Issue #464)

Hi folks. Thanks for the quick reply and for your suggestions. The proposal to add options to identify_effect makes perfect sense to me, as does adding an extra argument to pass the costs. We will start working on this and once we have something decent we will submit a draft PR to get feedback.

- Reply to this email directly, view it on GitHubhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fpy-why%2Fdowhy%2Fissues%2F464%23issuecomment-1154214539&data=05%7C01%7Cemrek%40microsoft.com%7C39f75036d92c4e8673ca08da4d660b87%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637907397925688513%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=%2BeVLheJB8jkwAJH%2BbiptUbul6TNsxpwzf8OF5i4P3ps%3D&reserved=0, or unsubscribehttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABNUPUA67RPAHAY5MPHJYXDVO5Y43ANCNFSM5YP5I6ZQ&data=05%7C01%7Cemrek%40microsoft.com%7C39f75036d92c4e8673ca08da4d660b87%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637907397925688513%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=5fsGZYed5x0NkYLjF4Z0Wog1XNPYfzGIc10S7dmvGbQ%3D&reserved=0. You are receiving this because you commented.Message ID: @.**@.>>