py-why / EconML

ALICE (Automated Learning and Intelligence for Causation and Economics) is a Microsoft Research project aimed at applying Artificial Intelligence concepts to economic decision making. One of its goals is to build a toolkit that combines state-of-the-art machine learning techniques with econometrics in order to bring automation to complex causal inference problems. To date, the ALICE Python SDK (econml) implements orthogonal machine learning algorithms such as the double machine learning work of Chernozhukov et al. This toolkit is designed to measure the causal effect of some treatment variable(s) t on an outcome variable y, controlling for a set of features x.
https://www.microsoft.com/en-us/research/project/alice/
Other
3.83k stars 718 forks source link

Predicting the treatment in DML #426

Open matheusfacure opened 3 years ago

matheusfacure commented 3 years ago

I'm trying to use the treatment models in DML to get a prediction of the treatment. I can access the models easily with est.models_t

However, it's not trivial to go from those models to a predictions of the treatment. There are some internal transformations on the features, so just passing the original X to the treatment models won't work.

# fits DML with 10 features, one continous treatment
features = ["A", "B", ....]
treatment = "T" # continuous
target = "Y" # continuous

est = LinearDML(model_y =LGBMRegressor(**PARAMS), model_t=LGBMRegressor(**PARAMS), linear_first_stages= False)
est.fit(Y=train_data[target], T=train_data[treatment], X=train_data[features], W=train_data[features])

t_model = est.models_t[0].predict(train_data[features])
ValueError: Number of features of the model must match the input. Model n_features_ is 20 and input n_features is 10 

It would be awesome to have a method that makes predictions using the treatment and outcomes models.

def predict_t(self, X):
    return np.mean([m_t.predict(X) for m_t in self.models_t] axis=1)
vsyrgkanis commented 3 years ago

Seems a good addition indeed! We could have sth like propensity_(X,W)

for now you just need to concatenate X and W and call the models_t.

So that should be a solution until we get to this. Might take a while due to a big queue of things to do. If you want to contribute such a method we would be happy to review too!

vsyrgkanis commented 3 years ago

In fact in your code you most probably want to set W=None.

the api is that: we control for both X and W and we only use X for heterogeneity.

so in your current code internally we would be training propensities on duplicate features