Predicting the treatment in DML

matheusfacure commented 3 years ago

I'm trying to use the treatment models in DML to get a prediction of the treatment. I can access the models easily with est.models_t

However, it's not trivial to go from those models to a predictions of the treatment. There are some internal transformations on the features, so just passing the original X to the treatment models won't work.

# fits DML with 10 features, one continous treatment
features = ["A", "B", ....]
treatment = "T" # continuous
target = "Y" # continuous

est = LinearDML(model_y =LGBMRegressor(**PARAMS), model_t=LGBMRegressor(**PARAMS), linear_first_stages= False)
est.fit(Y=train_data[target], T=train_data[treatment], X=train_data[features], W=train_data[features])

t_model = est.models_t[0].predict(train_data[features])

ValueError: Number of features of the model must match the input. Model n_features_ is 20 and input n_features is 10

It would be awesome to have a method that makes predictions using the treatment and outcomes models.

def predict_t(self, X):
    return np.mean([m_t.predict(X) for m_t in self.models_t] axis=1)

vsyrgkanis commented 3 years ago

Seems a good addition indeed! We could have sth like propensity_(X,W)

for now you just need to concatenate X and W and call the models_t.

So that should be a solution until we get to this. Might take a while due to a big queue of things to do. If you want to contribute such a method we would be happy to review too!

vsyrgkanis commented 3 years ago

In fact in your code you most probably want to set W=None.

the api is that: we control for both X and W and we only use X for heterogeneity.

so in your current code internally we would be training propensities on duplicate features

py-why / EconML

Predicting the treatment in DML #426