py-why / dowhy

DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions. DoWhy is based on a unified language for causal inference, combining causal graphical models and potential outcomes frameworks.
https://www.pywhy.org/dowhy
MIT License
6.88k stars 916 forks source link

Is the translation from causal graph to EconML Double Machine Learning notation incorrect? #349

Open EgorKraevTransferwise opened 2 years ago

EgorKraevTransferwise commented 2 years ago

First of all, let me say how awesome I find the integration between econml and dowhy - I very much prefer expressing my problem in the graph language, and then choosing the right EconML machinery to solve it.

But it seems to me the conflation of variables (let's call them A) that in the causal graph are connected to the outcome only (rather than both treatment and outcome) with the X variable in double machine learning is actually incorrect. It is fine to think of them as 'effect modifiers', so have the treatment effect on outcome be conditioned on them.

However, it contradicts a causal graph that only connects A to the outcome to have the t_model consume it as one of its outputs, X, along with W. One could say that if there is no actual dependence, the fitted t_model will have zero coefficients on X anyway, but apart from the theoretical incorrectness, would it not create unneeded variance/noise?

Would it not be cleaner to supply A as one of the inputs to the y_model but not to the t_model, and use another way to distinguish between common causes that are and are not used as effect modifiers in DML?

To make it more precise: let's suppose we have a causal graph with common causes C, treatment T, outcome O, and some factors A that have no effect on the outcome. The graph is then C -> T, C-> O, T -> O, A->O.

The DML equations imply that both X and W have an effect on both T and O, so in the DML equations as presented in EconML C is the union of X and W, and conversely X is a subset of C.

If we want to add additional factors A that only affect the outcome, then X would be a subset of C + A instead of subset of C; so the clean way of translating a causal graph into a DML model would be to separately specify X (list of variable names) as one of the model_params, check that it's a subset of C+A, and then have t_model consume C and y_model consume C+A, with the treatment effect still being f(X)*T?

The integration of causal graphs with EconML is awesome, but would it not be even better if it'd be theoretically correct? :) Or am I missing something?

Happy to try to spell out in code what that might look like, if that PR has a chance of being accepted :)

EgorKraevTransferwise commented 2 years ago

To elaborate, the set A below could be represented via an outcome_modifiers init arg and class member in the CausalModel, and adjusting the DML classes to take an additional optional argument that would allow to use different feature sets for y_model and t_model.

Outcome modifiers could be processed in `CausalModel.init() like this:

if model_from_graph_string:
    if outcome_modifiers is not None:
       logger.warn("Model supplied via graph, ignoring the outcome_modifiers argument")
    outcome_modifiers = [n for n in incoming_edges(outcome) if n not in common_causes]

if outcome_modifiers is not None:
    if effect_modifiers is None:
        effect_modifiers = outcome_modifiers
    else:
        assert all([e in common_causes + outcome_modifiers for e in effect_modifiers])
else:
    if effect_modifiers is not None:
         outcome_modifiers = [e for e in effect_modifiers if e not in common_causes]
EgorKraevTransferwise commented 2 years ago

The power of the DoWhy approach is in the clean separation between the causal graph and the models/functional relationships chosen to fit the data. I would argue that in that hierarchy, the effect_modifiers concept belongs at functional relationships level, as it refers to interaction terms (linear or otherwise) between treatment and other variables in the functional relationship determining the outcome. This is confirmed by the fact that it's only applicable to some models/fitting methods and not others.

Introducing the distinction proposed above between outcome_modifiers and effect_modifiers would restore that hierarchy. We could still keep the effect_modifiers in CausalModel for backwards compatibility if desired, as long as it's distinct from outcome_modifiers.

amit-sharma commented 2 years ago

Wow, thanks for these thoughtful notes, @EgorKraevTransferwise. We've discussed how to translate the effect_modifiers parameters to X in CATE estimators with the EconML team, and it is still an ongoing discussion. Adding some notes below to reflect the key points.

Effect modifiers in DoWhy: Overall, I like the idea of separating model structure from functional relationships. However, effect modifiers is a tricky concept. DoWhy currently supports the simplest effect modifier (cause of outcome) but there are many variations possible.

Vanderweele and Robins, for example, describe three main kinds of effect modifiers. Given an outcome Y,

  1. Parent of Y
  2. Ancestor of Y
  3. Child of parent of Y
  4. Child of ancestor of Y

Note how the definitions are structural. A future target for dowhy is to support all these kinds of effect modifiers.

Nomenclature: DoWhy uses the term "effect modifers" since it is generally accepted in the research community. That said, I see the value of adding "outcome_modifier" for clarity, interpreted as a subset of the possible effect modifiers. The API for causal model could accept both effect_modifiers and outcome_modifiers parameters, where outcome_modifiers refers to 1 & 2 above, and effect_modifiers refers to any of 1,2,3,4. In the spirit of being explicit, we could also have another parameter called 'descendants_outcome_modifiers" that corresponds to 3 & 4. This is a fine proposal, but we need to be careful because the 4 kinds above do not cover all types of effect modifiers. There is still debate on this topic, see, for example, this comment by Weinberg. So we may still need the catch-all effect_modifiers parameter in case the structure is hard to express in these 4 categories but still a user expects the variable to be treated as an effect modifier in the CATE estimation. Also, there are variables that are structurally common causes but they are also effect modifiers (i.e., we want the effect of treatment conditional on these variables), so we need a way of specifying them for the downstream CATE estimator.

CATE Estimation: Beyond expressing the structure, the second question you raise is whether it is worth separating out X in DML to two parts: treatment_model_x and outcome_model_x. This is a statistical question, and the answer is often surprising. For example, Cinelli, Forney and Pearl argue that it is statistically efficient to condition on Y's causes (to reduce variance) even though they do not bias the estimate. So as a rule of thumb, backdoor estimation methods should include outcome modifiers.

Coming to the specific method employed by DML, I think it makes sense to condition on outcome modifiers in practice. Using outcome modifiers in the treatment model does not introduce any bias, and it may actually help with bias in case one of the outcome_modifiers did affect the treatment (incorrect graph specification). That said, you are right that variance may increase, but then we can always use cross-validation for the treatment model to ensure that we have a good variance tradeoff (and ML models are designed to work with high dimensional features anyways).

Summary:

  1. On DoWhy's end, I think your suggestion is an excellent one. I'd be happy to merge a PR in this direction. It makes the connection to the causal graph explicit.
  2. On estimation/EconML end, I'm not sure about adding a distinction between X for treatment and X for the outcome. Paging @vsyrgkanis for comments. Since this is a DML/EconML specific issue, you may also consider posting it on the EconML issues page.
EgorKraevTransferwise commented 2 years ago

Thanks for the detailed answer! We seem to be aligned on the basic point, namely that 'outcome modifiers' is purely a property of the causal graph (parents of Y that are not parents of the treatment), and such a parameter could be useful in my opinion primarily in a simple specification of the causal graph; whereas effect modifiers can be looked at as a set of 'good controls' in the outcome model, that are part of a particular modeling choice, and are at the modeler's discretion but constrained by the causal graph structure in ways outlined by the paper you cite.

I'll see when I can find time to do a proper PR on this.

On the other hand, on the topic of separating treatment_model_x and outcome_model_x, I don't really understand how the paper you quote supports your argument.

For example, Cinelli, Forney and Pearl argue that it is statistically efficient to condition on Y's causes (to reduce variance) even though they do not bias the estimate. So as a rule of thumb, backdoor estimation methods should include outcome modifiers.

You seem to be referring to their Model 8, which is actually quite intuitive for me (adding more controls for the outcome may improve precision by abstracting outcome variability due to those factors). What I am trying to avoid (and the current code structure forces on me) is having to use all outcome controls also as inputs to the treatment propensity model, which to me seems to be a case of Model 9 - Neutral Control (possibly bad for precision) in that same paper. Or am I missing something? I can't see how adding spurious controls to the treatment propensity model can do anything but increase estimation variance.

In the extreme case of the treatment being randomly assigned, one can work around this by using DummyClassifier(strategy="prior") as propensity_model, but in the more general case surely it would be good to keep the two distinct? Or could you give other sources or arguments that argue otherwise?

amit-sharma commented 2 years ago

Really sorry for the late reply, somehow I missed the notification for this. You have a good point about not including all variables for the treatment propensity model.

Based on this, here's a strategy. Let me know what you think about this.

This way, we keep the structural parameters in CausaModel, while allowing the estimators to use them in a customized way. Would this help address the issue?

emrekiciman commented 2 years ago

I have a clarification question about this proposed API: Do you intend for treatment_modifiers to be a superset of common_causes, or are these sets disjoint? My initial intuition was that they were disjoint, but then I saw your comment that the propensity score model would use 'treatment_modifiers'. If they were disjoint, then the propensity score model would use both treatment_modifiers and common_causes, but not outcome_modifiers.


From: Amit Sharma @.> Sent: Saturday, January 29, 2022 10:16:02 PM To: microsoft/dowhy @.> Cc: Subscribed @.***> Subject: Re: [microsoft/dowhy] Is the translation from causal graph to EconML Double Machine Learning notation incorrect? (Issue #349)

Really sorry for the late reply, somehow I missed the notification for this. You have a good point about not including all variables for the treatment propensity model.

Based on this, here's a strategy. Let me know what you think about this.

This way, we keep the structural parameters in CausaModel, while allowing the estimators to use them in a customized way. Would this help address the issue?

- Reply to this email directly, view it on GitHubhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmicrosoft%2Fdowhy%2Fissues%2F349%23issuecomment-1025079413&data=04%7C01%7Cemrek%40microsoft.com%7C529df248e58a4ac3a58108d9e3b7fdd1%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637791201645467683%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=efnNeem2PLK0TNZZ%2Fy2f3UCEWGu2XTFablxg04vTxOw%3D&reserved=0, or unsubscribehttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABNUPUCARANWWSNVHOJEJRLUYTJSFANCNFSM5KN6J23A&data=04%7C01%7Cemrek%40microsoft.com%7C529df248e58a4ac3a58108d9e3b7fdd1%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637791201645467683%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=pHeH6uj6jfuSidMw2%2BCP9pTxKXrJO4v%2BlMoTwmv621E%3D&reserved=0. Triage notifications on the go with GitHub Mobile for iOShttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fapps.apple.com%2Fapp%2Fapple-store%2Fid1477376905%3Fct%3Dnotification-email%26mt%3D8%26pt%3D524675&data=04%7C01%7Cemrek%40microsoft.com%7C529df248e58a4ac3a58108d9e3b7fdd1%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637791201645517678%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=2XHW4VsbyC5qt%2BG2d8UYn927mij6XjymgHaDwuuVRZc%3D&reserved=0 or Androidhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fplay.google.com%2Fstore%2Fapps%2Fdetails%3Fid%3Dcom.github.android%26referrer%3Dutm_campaign%253Dnotification-email%2526utm_medium%253Demail%2526utm_source%253Dgithub&data=04%7C01%7Cemrek%40microsoft.com%7C529df248e58a4ac3a58108d9e3b7fdd1%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637791201645517678%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=sZ01Op1%2F6GDZeTrQ%2BXf9n%2F%2FmzTzHkDy8riUE1Tf0q2E%3D&reserved=0. You are receiving this because you are subscribed to this thread.Message ID: @.***>

amit-sharma commented 2 years ago

That's a good point. I was thinking of treatment_modifiers as any variable that affects the treatment (and the user would like to include in their treatment model like a propensity model). But I like your idea of considering them disjoint.

So we can revise the proposal to: treatment_modifiers are any causes of treatment that are not already covered in common_causes. Then, they are disjoint and they propensity score model should use both treatment_modifiers and common_causes.

I guess it should be the same for outcome_modifiers too, so that outcome_modifiers and common_causes are disjoint. The only discrepancy then, is in the effect modifiers which can contain some common_causes too, as discussed above.