Algorithms for efficient adjustment sets

esmucler commented 2 years ago

Hello DoWhy team.

Congrats on the great work on this package! I wonder if you would be interested in a contribution to the package. First, a brief intro.

In a series of papers with co-authors (1, 2, and 3, the last one currently under review in the Journal of Causal Inference), we have developed theory and algorithms to compute efficient (meaning low variance) adjustment sets for estimating the average treatment effect of a treatment on an outcome under a non-parametric causal graphical model. Our results allow for hidden variables in the graph (as long as at least one adjustment set is comprised of observable variables), and the possibility of individualised treatments (in which the values of the intervention variable depend on some other set of variables).

More precisely, suppose we are given a causal graph G specifying:

a treatment variable A,
an outcome variable Y,
a set of observable (that is, non-latent) variables N,
a set of observable variables that will be used to allocate treatment L, and possibly
positive costs associated with each observable variable.

Suppose moreover that there exists at least one adjustment set with respect to A and Y in G that is comprised of observable variables. Consider the following definitions:

An optimal adjustment set is an observable adjustment set that yields non-parametric estimators of the interventional mean with the smallest asymptotic variance among those that are based on observable adjustment sets.
An optimal minimal adjustment set is an observable adjustment set that yields non-parametric estimators of the interventional mean with the smallest asymptotic variance among those that are based on observable minimal adjustment sets. An observable minimal adjustment set is a valid adjustment set such that all its variables are observable and the removal of any variable from it destroys validity.
An optimal minimum cost adjustment set is defined similarly, being optimal in the class of observable adjustment sets that have minimum possible cost.

Under these assumptions, we have shown that optimal minimal and optimal minimum cost adjustment sets always exist, and can be computed in polynomial time. We also provide a sufficient criterion for the existence of an optimal adjustment set and a polynomial time algorithm to compute it when it exists.

These results are not only valid for non-parametric graphs and estimators, but also by virtue of results in this paper, for linear structural equation models and OLS estimators.

We have implemented these algorithms in the optimaladj package, with routines from networkx doing most of the algorithmic heavy lifting. We believe they would be a nice addition to the DoWhy package.

Going back to my first point. Would you be interested in a PR that incorporates these algorithms into DoWhy? They could supplement the already implemented backdoor identification strategy.

Best, Ezequiel Smucler

amit-sharma commented 2 years ago

Hey, @esmucler thanks for reaching out. I'm familiar with your work and I think it will be a great addition to DoWhy.

Thinking in terms of user-facing API, here's a proposal: the identify_effect method already contains options to constrain the adjustment set returned. We have "exhaustive", "minimal-adjustment", "maximal-adjustment" and a "default" that is a heuristic mix of minimal and maximum. One option is to add more options here for the user, which may include, "optimal-adjustment", "optimal-minimal-adjustment" and "optimal-minimum-cost-adjustment". Would something like this work?

In terms of code structure, it may be easiest to add a new class for your method under causal_identifiers folder. Then you may need to modify identify_ate_effect method in CausalIdentifier class to add a call to your class.

In any case, feel free to raise a PR. You may also consider submitting a draft (incomplete) PR so that we can review the code structure and API before all the detailed code is added.

emrekiciman commented 2 years ago

+1, I agree that this would be a great addition, Ezequiel!

Amit, where in the dowhy API would we add the costs associated with observed variables? Would we embed them within the graph structure, or add them as an extra argument in the identify_effect method? Any thoughts about what seems more natural, Ezequiel?

From: Amit Sharma @.> Sent: Sunday, June 12, 2022 9:52 PM To: py-why/dowhy @.> Cc: Subscribed @.***> Subject: Re: [py-why/dowhy] Algorithms for efficient adjustment sets (Issue #464)

Hey, @esmuclerhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fesmucler&data=05%7C01%7Cemrek%40microsoft.com%7C0df55eab3fe241ab1ab708da4cf87ea4%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637906927407941926%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=%2FCJn775yQSYMiqttndpEjwAmxVKa%2Bwi5u2HoLhZC6mM%3D&reserved=0 thanks for reaching out. I'm familiar with your work and I think it will be a great addition to DoWhy.

Thinking in terms of user-facing API, here's a proposal: the identify_effect method already contains options to constrain the adjustment set returned. We have "exhaustive", "minimal-adjustment", "maximal-adjustment" and a "default" that is a heuristic mix of minimal and maximum. One option is to add more options here for the user, which may include, "optimal-adjustment", "optimal-minimal-adjustment" and "optimal-minimum-cost-adjustment". Would something like this work?

In terms of code structure, it may be easiest to add a new class for your method under causal_identifiers folder. Then you may need to modify identify_ate_effect method in CausalIdentifier class to add a call to your class.

In any case, feel free to raise a PR. You may also consider submitting a draft (incomplete) PR so that we can review the code structure and API before all the detailed code is added.

- Reply to this email directly, view it on GitHubhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fpy-why%2Fdowhy%2Fissues%2F464%23issuecomment-1153469533&data=05%7C01%7Cemrek%40microsoft.com%7C0df55eab3fe241ab1ab708da4cf87ea4%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637906927407941926%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=gKVZvv%2BLu%2FPE83CkpKoFvQcZuB%2BfuXoOifu90KPcoz4%3D&reserved=0, or unsubscribehttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABNUPUBLIPXNEJUCVDZ5ZQ3VO25AFANCNFSM5YP5I6ZQ&data=05%7C01%7Cemrek%40microsoft.com%7C0df55eab3fe241ab1ab708da4cf87ea4%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637906927407941926%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=uMUGtBmmCfUdC7%2FZEI%2B%2BeJ3vVQnpoT7atgrsMxD6kO0%3D&reserved=0. You are receiving this because you are subscribed to this thread.Message ID: @.**@.>>

esmucler commented 2 years ago

Hi folks. Thanks for the quick reply and for your suggestions. The proposal to add options to identify_effect makes perfect sense to me, as does adding an extra argument to pass the costs. We will start working on this and once we have something decent we will submit a draft PR to get feedback.

emrekiciman commented 2 years ago

Wonderful, looking forward! Thank you!

From: Ezequiel Smucler @.> Sent: Monday, June 13, 2022 10:56 AM To: py-why/dowhy @.> Cc: Emre Kiciman @.>; Comment @.> Subject: Re: [py-why/dowhy] Algorithms for efficient adjustment sets (Issue #464)

Hi folks. Thanks for the quick reply and for your suggestions. The proposal to add options to identify_effect makes perfect sense to me, as does adding an extra argument to pass the costs. We will start working on this and once we have something decent we will submit a draft PR to get feedback.

- Reply to this email directly, view it on GitHubhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fpy-why%2Fdowhy%2Fissues%2F464%23issuecomment-1154214539&data=05%7C01%7Cemrek%40microsoft.com%7C39f75036d92c4e8673ca08da4d660b87%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637907397925688513%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=%2BeVLheJB8jkwAJH%2BbiptUbul6TNsxpwzf8OF5i4P3ps%3D&reserved=0, or unsubscribehttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABNUPUA67RPAHAY5MPHJYXDVO5Y43ANCNFSM5YP5I6ZQ&data=05%7C01%7Cemrek%40microsoft.com%7C39f75036d92c4e8673ca08da4d660b87%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637907397925688513%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=5fsGZYed5x0NkYLjF4Z0Wog1XNPYfzGIc10S7dmvGbQ%3D&reserved=0. You are receiving this because you commented.Message ID: @.**@.>>

py-why / dowhy

Algorithms for efficient adjustment sets #464