py-why / dowhy

DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions. DoWhy is based on a unified language for causal inference, combining causal graphical models and potential outcomes frameworks.
https://www.pywhy.org/dowhy
MIT License
7.04k stars 926 forks source link

Observed Confounder Strength for Binary Y or T #381

Open james-belmonte opened 2 years ago

james-belmonte commented 2 years ago

https://github.com/microsoft/dowhy/blob/52dd55f51096bba37160afc696cbf293de109eba/dowhy/causal_refuters/add_unobserved_common_cause.py#L75

For Binary Y or T, the estimate adjusts for the other observed confounders But when Y or T are continuous, the estimate is an unadjusted regression coef

Can you explain why there is a difference? Additionally is there any literature explaining these methods for estimating the effect of observed confounding?

Thanks!

amit-sharma commented 2 years ago

The intention here is to select a default value of strength of unobserved confounding. For continuous Y or T, the correlation of U with Y or T denotes strength of confounding. For binary Y or T, the number of times U can change the value of Y or T denotes strength.

Since the true strength of unobserved confounding is unknown, the right formulation is the one for which we can establish a reasonable rule that bounds the confounding strength. The reasoning for the current choice regarding adjusted or unadjusted metric is:

  1. For continuous Y/T, it is plausible that people can imagine/know about a confounder and its correlation with Y/T from domain knowledge or other published work. It will be difficult for them to know its adjusted correlation accounting for all the observed confounders in a current study. So that's why the unadjusted correlation may be a good way to parameterize kappa.
  2. For binary Y/T, people need to think about/estimate the fraction of times that an unobserved confounder can flip the variable's value. It may be hard to think of a variable that can flip the value on its own, but it can be easier to think about flipping if you already know the contribution of available variables. E.g., in a logistic model, if a data point's score is already near 0.5, it may be easier to flip.

To summarize, in 1), we are prioritizing external knowledge about correlations, which are expected to be general quantities beyond the current study. And for 2), we are prioritizing the knowledge in the current study, given that estimates of flipping an outcome are more likely to be reported in context (adjusted).

That said, all of the above is just conjecture. There can be settings where adjusted kappa makes sense for continuous Y/T and vice-versa. In the current code, it is also a bit awkward that kappa_t/kappa_y assume a different meaning (unadjusted/adjusted) based on the kind of variable (continuous/discrete).

Perhaps the best way is to provide the user with a choice. We can consider adding a parameter adjusted={True, False, "default"} that can provide the maximum flexibility. What do you think?

The literature on this topic isn't conclusive. There are many metrics /formulations of the confounding strength that have been proposed. Here's a good reference that summarizes some of the early work in this area that dealt with binary treatments (see Section 3): https://www.econstor.eu/bitstream/10419/234287/1/dp2029.pdf To provide some examples (by no way exhaustive), these two are recent papers on the topic:

  1. E-value that uses adjusted effect
  2. Robustness value for linear regression.
james-belmonte commented 2 years ago

Hello thanks @amit-sharma so much for the thoughtful response, I appreciate it and will review the links you sent.

Regarding adding a parameter, I think that could be helpful in terms of interpretability and consistency, especially when discussing the plausibility of specific levels of unobserved confounding with domain experts.

Additionally, I am curious if the default flip threshold of .5 is warranted. Would we we over estimating the confounding effect on a binary variable if we set the threshold to the prevalence? For example, if we have a binary T that only has a value of 1, 15% of the time in our observed data, the PS distribution may be well below .5, and likely centered at .15. Therefore a confounder would have to be extremely strong to flip even 1 observation over .5.

My question is, are we biasing the confounding effect by choosing .5 as the threshold? Or would we be adding bias by using the observed prevalence as the threshold?

amit-sharma commented 2 years ago

Okay, I will work on adding that parameter.

Great question on the default flip threshold. The code assumes an additive error model for generation of T. For example, consider t=[0.1w0 + 0.1w1 + 0.29u + e]>=0.5 where w0, w1 and u are binary; e is random error bounded by 0.1. Then, even though u's effect is stronger than W's effect on T, for data point where W0 or W1 is zero, ignoring u to predict the estimate of T always gives the correct value of T. So for those inputs, u does not affect value of T and its confounding strength on T should be considered zero. The prevalence of T (P(T=1)) is not required for estimating confounding strength, but the question remains on what should be the correct threshold (without knowing the true threshold).

In another generation model (e.g., logistic) that interprets the score as a probability, i.e., P(T=1) = f(0.1w0 + 0.1w1 + 0.29u) , even a small change in probability changes the values of T in the dataset. In that case, the flip probability is |P_u(T=1) - P_u'(T=1)|.

Perhaps we should be taking the max of these two flip probabilities, to simulate the worst case (unless the user specifies which DGP to use: additive with threshold or probabilistic). In theory, most textbooks would assume a probabilistic generation, but in practice, many decisions (actions or treatments) are threshold-based.

james-belmonte commented 2 years ago

This is very helpful. Thank you for explaining that.

One last question:

When kappa is inferred or declared by the user and used to simulate an unobserved confounder, the direction of confounding is not taking into account for a binary variable. The flip is 1-observed value, irrespective of what that observed value may be. So in practice if we want to hypothesize an unobserved confounder that we believe would increase the propensity towards T, the implementation does not seem to accomodate a directional assumption. Am I understanding this incorrectly? Is this by design or just a limitation of the current implementation?