Closed MFreidank closed 6 years ago
Currently, dowhy only supports ATE. We hope to support individual treatment effects (ITE) in the future.
Does dowhy
support estimating ITE right now?
Is there any methods on schedule?
@thisisreallife Yes, DoWhy supports many of the latest ITE and conditional average treatment effect (CATE) methods through integration with the EconML library. You can invoke all EconML CATE methods using DoWhy's estimate_effect
function. Details are in the conditional effects notebook: https://github.com/microsoft/dowhy/blob/master/docs/source/example_notebooks/dowhy-conditional-treatment-effects.ipynb
Sharing a quick example below. For CATE/ITE, the key is to specify effect_modifiers
variables in CausalModel
, and the returned estimates are separately computed (conditioned) for each value of the effect_modifier. Many of the CATE methods use a parameterized model to "condition" on the effect modifiers, so using that model you can also obtain ITE by asking DoWhy to estimate effect on a specific sample of the data. Often, the effect modifiers are a subset of the common causes.
# Model the graph and identify effect
model = CausalModel(data=data["df"],
treatment=data["treatment_name"], outcome=data["outcome_name"],
common_causes=data["common_causes_names"],
effect_modifiers=data["effect_modifier_names"] # the variables on which to compute the CATE
)
identified_estimand= model.identify_effect(proceed_when_unidentifiable=True)
# 1) Linear ITE/CATE estimator (in-built in DoWhy)
linear_estimate = model.estimate_effect(identified_estimand,
method_name="backdoor.linear_regression",
control_value=0,
treatment_value=1)
print(linear_estimate)
# 2) EconML double machine learning estimator
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LassoCV
from sklearn.ensemble import GradientBoostingRegressor
dml_estimate = model.estimate_effect(identified_estimand, method_name="backdoor.econml.dml.DML",
control_value = 0,
treatment_value = 1,
target_units = lambda df: df["X0"]>1, # condition used for CATE
confidence_intervals=False,
method_params={"init_params":{'model_y':GradientBoostingRegressor(),
'model_t': GradientBoostingRegressor(),
"model_final":LassoCV(fit_intercept=False),
'featurizer':PolynomialFeatures(degree=1, include_bias=False)},
"fit_params":{}})
print(dml_estimate)
The output shows you the ITE estimates, along with the ATE on the given data.
*** Causal Estimate ***
## Identified estimand
Estimand type: nonparametric-ate
## Realized estimand
b: y~v0+W0+W1+W3+X1+W2+X0 | X1,X0
Target units: Data subset defined by a function
## Estimate
Mean value: 17.197494868852377
Effect estimates: [16.0910598 14.14293255 13.09601451 ... 7.20839431 14.33411557
14.8303657 ]
The returned estimate is just like any other DoWhy estimate. You can use all of the refutation methods in DoWhy on this estimate. To compute the effect on an unseen "test" data, you can specify a new dataset (or a single row) for which you want to estimate the effect.
# Estimate effect on a new sample
test_cols= data['effect_modifier_names'] # only need effect modifiers' values
test_arr = [np.random.uniform(0,1, 10) for _ in range(len(test_cols))] # all variables are sampled uniformly, sample of 10
test_df = pd.DataFrame(np.array(test_arr).transpose(), columns=test_cols)
dml_estimate = model.estimate_effect(identified_estimand,
method_name="backdoor.econml.dml.DML",
target_units = test_df,
confidence_intervals=False,
method_params={"init_params":{'model_y':GradientBoostingRegressor(),
'model_t': GradientBoostingRegressor(),
"model_final":LassoCV(),
'featurizer':PolynomialFeatures(degree=1, include_bias=True)},
"fit_params":{}
})
Hi Amit,
I am trying to find ITE for multiple nonbinary treatment effects(v0, v1, v2...). Although I was able to find the ATE for each treatment effect using the Linear regression method, I am not able to find the ITE.
Also, is it possible to calculate ATE weighted by treatment and not ATE with just treatment ==1? I am trying to find the causal impact of different types of ad spend on sales using observation data so it will be great if I can find ATE weighted by treatment(spend).
Thanks in advance.
I am following the notebook mentioned above (section on metalearners) to fit the model
metalearner_estimate = model_experiment.estimate_effect(identified_estimand_experiment,
method_name="backdoor.econml.metalearners.TLearner",
confidence_intervals=False,
method_params={"init_params":{
'models': RandomForestRegressor()
},
"fit_params":{}
})
print(metalearner_estimate)
print("True causal estimate is", data_experiment["ate"])
which shown
## Estimate
Mean value: 9.029560445766515
Effect estimates: [4.47306709e+00 1.43479304e+01 4.78084000e+00 ... 1.12416045e-02
7.12154307e+00 8.59979495e+00]
Now I would like to estimate ITE on some test dataset. But before doing that, I pass the same training dataset, with fit_estimator=False
. I was expecting that the model would produce the same ITE, but It did not:
## Estimate
Mean value: 6.361292555011156
Effect estimates: [ 7.92573838 4.05548979 1.29972831 ... 1.09842848 5.02572197
13.46047503]
Could you help me understand why this is the case? Thanks
Hello, I am interested in estimating the CATE/ITE for my dataset as well. I struggle with the difference between the effect_modifiers
and target_units.
In my case, I have a marketing dataset and my flatfile consists of unique customer IDs, I want to then estimate the causal effect for each customer. Is it then sufficient to mark the customer_id as effect_modifier
or what would be the right code for the target_unit
then?
I would be very happy for your support. Thank you in advance
effect modifiers are the variables that change the causal effect. E.g., whether a customer is a loyalty program member can be an effect modifier for a marketing campaign. I wouldn't recommend making each customer ID as an effect modifier, unless you have multiple rows of data for each customer id.
Target units are simply the subset of the data on which you want to estimate the causal effect. Sometimes, you want the average treatment effect on the treated (ATT) instead of the treatment effect on everyone. Or you may be interested in treatment effect on a specific subset of customers.
If you are only interested in ITE, you can ignore target units. Just set the correct value for the effect modifiers. By default, dowhy/econml will return the ITE for every unit (row) and then you can access them as an array.
Thank you very much! This was helpful for me. So when I specify the confounders in the causal graph, I don't need to specify them as effect_modifiers
again, right?
After having estimated the ITEs, I get the array with the values. Is there a way to combine it with the belonging customer IDs to compare my results in Dowhy with observational data with external results of an randomized experiment?
And my last question is, how can I use my model to predict the causal effects with a new dataset (e.g. customers from today) in DoWhy. E.g. to take it as a help for decision-making? I saw it in the Roadmap File as predict_outcome
...
All the best and thanks for your support with using DoWhy.
Is it possible to obtain ITE estimates from
dowhy
? I seem to only be able to computeATE
without actually tampering with the source code.