py-why / dowhy

DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions. DoWhy is based on a unified language for causal inference, combining causal graphical models and potential outcomes frameworks.
https://www.pywhy.org/dowhy
MIT License
7.09k stars 932 forks source link

Individualized Treatment Effect Estimation (ITE) #11

Closed MFreidank closed 6 years ago

MFreidank commented 6 years ago

Is it possible to obtain ITE estimates from dowhy? I seem to only be able to compute ATE without actually tampering with the source code.

amit-sharma commented 6 years ago

Currently, dowhy only supports ATE. We hope to support individual treatment effects (ITE) in the future.

thisisreallife commented 3 years ago

Does dowhy support estimating ITE right now? Is there any methods on schedule?

amit-sharma commented 3 years ago

@thisisreallife Yes, DoWhy supports many of the latest ITE and conditional average treatment effect (CATE) methods through integration with the EconML library. You can invoke all EconML CATE methods using DoWhy's estimate_effect function. Details are in the conditional effects notebook: https://github.com/microsoft/dowhy/blob/master/docs/source/example_notebooks/dowhy-conditional-treatment-effects.ipynb

Sharing a quick example below. For CATE/ITE, the key is to specify effect_modifiers variables in CausalModel, and the returned estimates are separately computed (conditioned) for each value of the effect_modifier. Many of the CATE methods use a parameterized model to "condition" on the effect modifiers, so using that model you can also obtain ITE by asking DoWhy to estimate effect on a specific sample of the data. Often, the effect modifiers are a subset of the common causes.

# Model the graph and identify effect
model = CausalModel(data=data["df"], 
                    treatment=data["treatment_name"], outcome=data["outcome_name"], 
                    common_causes=data["common_causes_names"],
                    effect_modifiers=data["effect_modifier_names"]   # the variables on which to compute the CATE
                    )
identified_estimand= model.identify_effect(proceed_when_unidentifiable=True)

# 1) Linear ITE/CATE estimator (in-built in DoWhy)
linear_estimate = model.estimate_effect(identified_estimand, 
                                        method_name="backdoor.linear_regression",
                                       control_value=0,
                                       treatment_value=1)
print(linear_estimate)

# 2) EconML double machine learning estimator
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LassoCV
from sklearn.ensemble import GradientBoostingRegressor
dml_estimate = model.estimate_effect(identified_estimand, method_name="backdoor.econml.dml.DML",
                                     control_value = 0,
                                     treatment_value = 1,
                                 target_units = lambda df: df["X0"]>1,  # condition used for CATE
                                 confidence_intervals=False,
                                method_params={"init_params":{'model_y':GradientBoostingRegressor(),
                                                              'model_t': GradientBoostingRegressor(),
                                                              "model_final":LassoCV(fit_intercept=False), 
                                                              'featurizer':PolynomialFeatures(degree=1, include_bias=False)},
                                               "fit_params":{}})
print(dml_estimate)

The output shows you the ITE estimates, along with the ATE on the given data.

*** Causal Estimate ***

## Identified estimand
Estimand type: nonparametric-ate

## Realized estimand
b: y~v0+W0+W1+W3+X1+W2+X0 | X1,X0
Target units: Data subset defined by a function

## Estimate
Mean value: 17.197494868852377
Effect estimates: [16.0910598  14.14293255 13.09601451 ...  7.20839431 14.33411557
 14.8303657 ]

The returned estimate is just like any other DoWhy estimate. You can use all of the refutation methods in DoWhy on this estimate. To compute the effect on an unseen "test" data, you can specify a new dataset (or a single row) for which you want to estimate the effect.

# Estimate effect on a new sample
test_cols= data['effect_modifier_names'] # only need effect modifiers' values
test_arr = [np.random.uniform(0,1, 10) for _ in range(len(test_cols))] # all variables are sampled uniformly, sample of 10
test_df = pd.DataFrame(np.array(test_arr).transpose(), columns=test_cols)
dml_estimate = model.estimate_effect(identified_estimand, 
                                     method_name="backdoor.econml.dml.DML",
                                     target_units = test_df,
                                     confidence_intervals=False,
                                     method_params={"init_params":{'model_y':GradientBoostingRegressor(),
                                                              'model_t': GradientBoostingRegressor(),
                                                              "model_final":LassoCV(), 
                                                              'featurizer':PolynomialFeatures(degree=1, include_bias=True)},
                                               "fit_params":{}
                                              })
pragalbh1 commented 3 years ago

Hi Amit,

I am trying to find ITE for multiple nonbinary treatment effects(v0, v1, v2...). Although I was able to find the ATE for each treatment effect using the Linear regression method, I am not able to find the ITE.

Also, is it possible to calculate ATE weighted by treatment and not ATE with just treatment ==1? I am trying to find the causal impact of different types of ad spend on sales using observation data so it will be great if I can find ATE weighted by treatment(spend).

Thanks in advance.

denyHell commented 2 years ago

I am following the notebook mentioned above (section on metalearners) to fit the model

metalearner_estimate = model_experiment.estimate_effect(identified_estimand_experiment, 
                                method_name="backdoor.econml.metalearners.TLearner",
                                confidence_intervals=False,
                                method_params={"init_params":{
                                                    'models': RandomForestRegressor()
                                                    },
                                               "fit_params":{}
                                              })
print(metalearner_estimate)
print("True causal estimate is", data_experiment["ate"])

which shown

## Estimate
Mean value: 9.029560445766515
Effect estimates: [4.47306709e+00 1.43479304e+01 4.78084000e+00 ... 1.12416045e-02
 7.12154307e+00 8.59979495e+00]

Now I would like to estimate ITE on some test dataset. But before doing that, I pass the same training dataset, with fit_estimator=False. I was expecting that the model would produce the same ITE, but It did not:

## Estimate
Mean value: 6.361292555011156
Effect estimates: [ 7.92573838  4.05548979  1.29972831 ...  1.09842848  5.02572197
 13.46047503]

Could you help me understand why this is the case? Thanks

Jessiw145 commented 2 years ago

Hello, I am interested in estimating the CATE/ITE for my dataset as well. I struggle with the difference between the effect_modifiers and target_units. In my case, I have a marketing dataset and my flatfile consists of unique customer IDs, I want to then estimate the causal effect for each customer. Is it then sufficient to mark the customer_id as effect_modifier or what would be the right code for the target_unit then?

I would be very happy for your support. Thank you in advance

amit-sharma commented 2 years ago

effect modifiers are the variables that change the causal effect. E.g., whether a customer is a loyalty program member can be an effect modifier for a marketing campaign. I wouldn't recommend making each customer ID as an effect modifier, unless you have multiple rows of data for each customer id.

Target units are simply the subset of the data on which you want to estimate the causal effect. Sometimes, you want the average treatment effect on the treated (ATT) instead of the treatment effect on everyone. Or you may be interested in treatment effect on a specific subset of customers.

If you are only interested in ITE, you can ignore target units. Just set the correct value for the effect modifiers. By default, dowhy/econml will return the ITE for every unit (row) and then you can access them as an array.

Jessiw145 commented 2 years ago

Thank you very much! This was helpful for me. So when I specify the confounders in the causal graph, I don't need to specify them as effect_modifiers again, right?

After having estimated the ITEs, I get the array with the values. Is there a way to combine it with the belonging customer IDs to compare my results in Dowhy with observational data with external results of an randomized experiment?

And my last question is, how can I use my model to predict the causal effects with a new dataset (e.g. customers from today) in DoWhy. E.g. to take it as a help for decision-making? I saw it in the Roadmap File as predict_outcome...

All the best and thanks for your support with using DoWhy.