py-why / EconML

ALICE (Automated Learning and Intelligence for Causation and Economics) is a Microsoft Research project aimed at applying Artificial Intelligence concepts to economic decision making. One of its goals is to build a toolkit that combines state-of-the-art machine learning techniques with econometrics in order to bring automation to complex causal inference problems. To date, the ALICE Python SDK (econml) implements orthogonal machine learning algorithms such as the double machine learning work of Chernozhukov et al. This toolkit is designed to measure the causal effect of some treatment variable(s) t on an outcome variable y, controlling for a set of features x.
https://www.microsoft.com/en-us/research/project/alice/
Other
3.87k stars 719 forks source link

DML: Should the treatment effect be constant at a given X, over various values of the treatment? #475

Open BrianMiner opened 3 years ago

BrianMiner commented 3 years ago

Hello.

I am running the example here: https://github.com/microsoft/EconML/blob/master/notebooks/CustomerScenarios/Case%20Study%20-%20Customer%20Segmentation%20at%20An%20Online%20Media%20Company.ipynb

I am confused by what is being estimated I think. After this

est = CausalForestDML(
    model_y=GradientBoostingRegressor(), model_t=GradientBoostingRegressor()
)
est.fit(log_Y, log_T, X=X, W=W, inference="blb")

I expected that since this was a non-parametric estimator, compared to linear DML that the treatment effect should differ at various values for T given a constant X.

When I run these, which I think are producing the CATE as the treatment goes from T0 to T1 at the given X values for each observation, they are all the same. These are all 1 unit apart for T0 and T1.

est.effect(X_test, T0=-0.22, T1=.78)
est.effect(X_test, T0=1.22, T1=2.22)

Further, if I produce what I think is the slope of the CATE at a given value of a treatment, for a fixed X, they are all the same:

est.marginal_effect(T= -0.22, X = X_test)
est.marginal_effect(T= 1, X = X_test)

Likewise if I produce the ATE for a given X, which I think are simply the mean of the above (est.effect(X_test, T0=0, T1=1)) for different values of T0 and T1, the results are all the same, The average marginal is the same too regardless of the value of T it is evaluated at....

est.ate(T0= 0 , T1 =1 ,X = X_test) est.ate(T0= -1 , T1 =0 ,X = X_test)

est.marginal_ate(T= -0.22, X = X_test) est.marginal_ate(T= 0, X = X_test)

Are my understandings incorrect that these should differ?

kbattocchi commented 3 years ago

Yes, this is a common misconception - all DML estimators estimate a linear effect of T on Y; the difference between LinearDML and CausalForestDML is in whether the effect is linear in the (featurized) features as well. CausalForestDML can estimate a fully non-parametric effect in X, but it will still always be the case that effect(X, T0, T1) = theta(X)*(T1-T0). Thus for all DML estimators you might as well just use const_marginal_effect instead of marginal_effect.

vsyrgkanis commented 3 years ago

Equation (1) says we fit a model Y = theta(X) * log(T) + ...

However, the DML takes as input at training time the log transformed variable, so it really fits a model Y = theta(X) * log_T + ... where log_T is the log transformed variable.

Then when you call effect(X, T0, T1) the T0 and T1 are supposed to be the log transformed variables, as this is all that the dml estimator saw at training (currently we don't pass a "treatment transformer" to the estimators, so that marginal effect and effect could be internally transformed, but it is in the plans).

So most prob, what you want to do is: est.effect(X_test, T0=log(T0), T1=log(T1))

Also if you want a marginal effect, you need to take the derivative of the log yourself, i.e.: d E[Y|T, X]/ dT = theta(X) * dlogT/dT = theta(X) / T

So you can do est.const_marginal_effect(X) / T

vsyrgkanis commented 3 years ago

Also note that we actually fit log(Y) = theta(X) * log_T

so these are the effects and marginal effects on log(Y) not Y. To get the effects on Y you also need to do further manipulations yourself.

Though the better way to think of this is view theta(X) (i.e. the const marginal effect of DML as the elasticity, in the standard terms that elasticity is being used in economics and pricing). This is why we fit log(Y) on log(T). And this assumes a constant elasticity in price (but heterogeneous in features).

BrianMiner commented 3 years ago

@vsyrgkanis Thank you. If I pass in as the value of T the log of the value of interest, say log(0.8) corresponding to one of the price points in the data (0.8 which is of interest), to see the slope of the treatment effect (price elasticity) at log price = 0.8, will

est.marginal_effect(T= ln(0.8), X_test ) produce the answer expected or do we still need to figure out the math?