py-why / EconML

ALICE (Automated Learning and Intelligence for Causation and Economics) is a Microsoft Research project aimed at applying Artificial Intelligence concepts to economic decision making. One of its goals is to build a toolkit that combines state-of-the-art machine learning techniques with econometrics in order to bring automation to complex causal inference problems. To date, the ALICE Python SDK (econml) implements orthogonal machine learning algorithms such as the double machine learning work of Chernozhukov et al. This toolkit is designed to measure the causal effect of some treatment variable(s) t on an outcome variable y, controlling for a set of features x.
https://www.microsoft.com/en-us/research/project/alice/
Other
3.82k stars 714 forks source link

Clarification about DML marginal_effect #747

Open jaydeepchakraborty opened 1 year ago

jaydeepchakraborty commented 1 year ago

Thank you for such wonderful package. I have categorical Treatment (value = 0, 1, 2) and I used DML.

The effect of the treatments is below.

image

Then I tried the marginal effect with base treatment = 0.

image

Now, If I try other treatment value as base treatment such as 1 or 2. It does not change the output. Even whatever value I put as T, the output doesnot change. dml_est.marginal_effect(T=1, X=dml_test_seg) OR dml_est.marginal_effect(T=2, X=dml_test_seg)

Am I missing something? Any help will be appreciated.

Thanks

kbattocchi commented 1 year ago

All of our DML estimators assume that the treatment effect is linear in T (assuming there is no treatment featurizer), so the marginal effect is constant regardless of what values of T you provide, which explains the behavior you're seeing (and you can just call const_marginal_effect instead, which doesn't take T as an argument because we know that the marginal effect is constant in T). So the marginal effect will vary with X, but then we'll just apply whatever coefficients θ(X) we compute to the treatments linearly.

To understand what a marginal effect means for a discrete treatment, we one-hot-encode the discrete treatment (and drop the control treatment's column), so you can interpret the coefficients of marginal_effect as saying what the relative change in the output is to a corresponding change in the treatment from control to T1 in the first column, and from control to T2 in the second column. Note that moving 100% from control to T1 is just what you compute in dml_est_effect_01, so that's why the first column of marginal_effect is identical to it, and likewise for dml_est_effect_02 and the second column. Moving 100% from T1 to T2 is the same as moving 100% from T1 to control and then 100% from control to T2, so dml_est_effect_12 is the same as the second column of marginal_effect minus the first column.

Does that help?

jaydeepchakraborty commented 1 year ago

Thank you very much @kbattocchi ,

So, if I have more than three discrete values (let's say, T0 (value 0), T1 (value 1), T2 (value 2), T3 (value 3)) , We can apply the same rule. dml_est_effect_13 should be same as Moving 100% from T1 to T3 is the same as moving 100% from T1 to control (T0) and then 100% from control (T0) to T3.

Is it correct understanding ?

kbattocchi commented 1 year ago

Yes, exactly. You could either use effect(..., T0=1, T1=3) or equivalently subtract the first column from the third column of const_marginal_effect(...) to get that effect.

jaydeepchakraborty commented 1 year ago

@kbattocchi , thank very much you for the confirmation.

I have one more question and I think it is related. So in treatment, we have three discrete values (0, 1, 2)

Follow are the columns of the DML estimator. print(f"features: {dml_est.cate_feature_names()}, output: {dml_est.cate_output_names()}, treatment: {dml_est.cate_treatment_names()}")

features: ['Dept', 'IsHoliday', 'Temperature', 'Fuel_Price', 'CPI'], output: ['Weekly_Sales'], treatment: ['Type_1', 'Type_2']

now I did below for the shap values.

-- Shap value for the final stage models (const_marginal_effect) dml_shap_vals = dml_est.shap_values(X=dml_test_seg, feature_names=['Dept', 'IsHoliday', 'Temperature', 'Fuel_Price', 'CPI'], treatment_names=['Type_1', 'Type_2'], output_names=['Weekly_Sales'], background_samples=100)

shap.summary_plot(dml_shap_vals['Weekly_Sales']['Type_1'])

image

shap.summary_plot(dml_shap_vals['Weekly_Sales']['Type_2'])

image

I have two questions. 1) If I consider the first image, lower the value of CPI higher the output (Weekly_Sales). I am not able to figure out how to fit the treatment (Type : value 1 ~ Type_1) in the figure.

I am assuming, we can say that moving Moving 100% from T0 to T1, the lower the value of CPI higher the output (Weekly_Sales).

Not sure whether the assumption is correct or not.

2) If my above assumption is correct, then how I can plot shap values when I move 100 from T1 to T2?

Thank you again for answering all the questions. I appreciate your help.

kbattocchi commented 1 year ago

I think your interpretation of the first chart is basically correct - it's saying that for the data you have, when CPI is low, the effect of moving from T0 to T1 will be high, and when CPI is high, the effect of moving from T0 to T1 will be low.

For your second question, I believe that the linearity of SHAP values should enable to express the shap values for that treatment direction as a difference in terms of the existing shap values, but our API doesn't currently support doing that.