py-why / dowhy

DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions. DoWhy is based on a unified language for causal inference, combining causal graphical models and potential outcomes frameworks.
https://www.pywhy.org/dowhy
MIT License
7.14k stars 935 forks source link

ATE for multiple continuous treatments #1274

Open ankur-tutlani opened 3 weeks ago

ankur-tutlani commented 3 weeks ago

We have a scenario where we want to assess the impact of two continuous treatments, T1 and T2 on outcome Y. We have some common causes X1, X2, and X3 all continuous. Target Y is also continuous. The need is to calculate the ATE with custom values of control and treatment values.

Questions we want to address:

  1. Impact of T1 on Y
  2. Impact of T2 on Y
  3. Impact of T1 and T2 together on Y (as T1 and T2 might have some influence on each other).

What should be the causal graphs which can answer these questions. For questions 1 and 2, I am assuming below graphs can answer these.

image image

For question 3, I started with following graph. image

I am looking to try below causal methods. backdoor.linear_regression backdoor.econml.dml.DML backdoor.econml.dml.LinearDML backdoor.econml.dml.KernelDML

I have got some results by using backdoor.linear_regression. But the results from using double ML (linear, DML) models do not make sense. Its giving outputs which is unrealistic. I am getting this warning while running double ML models. Not sure if I am specifying the values correctly in control_value, treatment_value?

A scalar was specified but there are multiple treatments; the same value will be used for each treatment. Consider specifyingall treatments, or using the const_marginal_effect method.

Below is the code which I tried for above causal structure to answer question 3. control_value_list and treatment_value_list contains the values for treatments T1, and T2 in the same order with which it was supplied while creating causal model object. e.g. control_value_list=[7,9] and treatment_value_list=[10,5]. Means for treatment T1, we want ATE with control value as 7 and treatment value as 10. And for treatment T2, we want control value as 9, and treatment value as 5.

model=CausalModel(
    data = data,
    treatment=['T1','T2'],
    outcome='Y',
    common_causes = ['X1','X2','X3']
    )
causal_estimate = model.estimate_effect(identified_estimand,
          method_name="backdoor.econml.dml.LinearDML",
          control_value = control_value_list,
          treatment_value = treatment_value_list,
          confidence_intervals=False,
          target_units = "ate",
                                        method_params={
                      "init_params":{'model_y':xgb.XGBRegressor(random_state=578,max_depth=3,n_estimators=100),
                                    'model_t': xgb.XGBRegressor(random_state=578,max_depth=3,n_estimators=100),
                                    'discrete_treatment' : False,
                                    # 'categories':[0,1],
                                    'random_state':587921,
                                    'cv':3

  #                                    'model_final':LassoCV(),
  #                                    'featurizer':PolynomialFeatures(degree=1, include_bias=True)
                                    },
                      "fit_params":{},
                      'num_null_simulations':399,'num_simulations':399})

Also, I am getting one ATE value from backdoor.linear_regression. But the output from backdoor.econml.dml.LinearDML are two separate values. Does the doubleML computing the ATE for 2 treatments separately? Also, I observed the code throws error when specifying confidence_intervals as True. Anything which can explain this?

Will the following causal structures answer question 3 better? Using one of the treatments as common cause along with rest of the other factors?

image image

If we get ATEs from those 2 graphs above, can we add those and say that it addresses question 3? Or its not additive? Are there any other recommendations to address question 3?