Multiple treatments when using grf.CausalForest

WEICHENGIT commented 3 years ago

Hi,

This is more of a question concerning the grf module rather than an issue. We tried to use grf.CausalForest to estimate the heterogeneous causal effect with multiple treatments. In our case, the treatments are the coupons with different amount sent to customers, then should the parameter T be a one-hot encoding matrix, or just an array with the coupon amount?

Thx!

liukanglucky commented 3 years ago

The same question. In grf.CausalForest T should be a matrix with size (n_samples, n_treatments) , if we have multiple discrete treatments how should the parameter T be set, and if we have multiple continuous treatments ?

heimengqi commented 3 years ago

@WEICHENGIT In terms of your question, if I understand it correctly, you want to estimate the treatment effect of coupon amount, then we could consider it as a continuous treatment and just input an array with this variable.

In addition, we always recommend you to use this CausalForest as the final stage for CATE estimation and combine with Double Machine Learning framework when you want to learn HTE, it will residualize the outcome and treatment first and fit CausalForest on residuals. You could use CausalForestDML as below:

est = CausalForestDML(cv=2,
                      criterion='mse', n_estimators=400,
                      min_var_fraction_leaf=0.1,
                      min_var_leaf_on_val=True,
                      verbose=0, discrete_treatment=False,
                      n_jobs=-1, random_state=123)

heimengqi commented 3 years ago

@liukanglucky Similarly to the answer above, we always recommend you to use this CausalForest as the final stage for CATE estimation and combine with Double Machine Learning framework when you want to learn HTE, it will residualize the outcome and treatment first and fit CausalForest on residuals.

when you have multiple discrete treatment, just input the raw T and set discrete_treatment=True when initiate the estimator, internally we will do one-hot-encoding for you; when you have multiple continuous treatments, then input the matrix with size (n_samples, n_treatments)

Here is the sample code:

# discrete treatment
est = CausalForestDML(discrete_treatment=True)
est.fit(Y, T, X, W) # T is the array of discrete treatment

# continuous treatment
est = CausalForestDML(discrete_treatment=False)
est.fit(Y, T, X, W) # T is matrix with size (n_samples, n_treatments)

liukanglucky commented 3 years ago

@liukanglucky Similarly to the answer above, we always recommend you to use this CausalForest as the final stage for CATE estimation and combine with Double Machine Learning framework when you want to learn HTE, it will residualize the outcome and treatment first and fit CausalForest on residuals.

when you have multiple discrete treatment, just input the raw T and set discrete_treatment=True when initiate the estimator, internally we will do one-hot-encoding for you; when you have multiple continuous treatments, then input the matrix with size (n_samples, n_treatments)

Here is the sample code:
# discrete treatment
est = CausalForestDML(discrete_treatment=True)
est.fit(Y, T, X, W) # T is the array of discrete treatment

# continuous treatment
est = CausalForestDML(discrete_treatment=False)
est.fit(Y, T, X, W) # T is matrix with size (n_samples, n_treatments)

@heimengqi Thank you for your reply ! I have tried this method, but there will be serious over fitting. （Qini-Curve was be used to evaluate, the training set is much better than the test set. ） My task is to identify the user's sensitivity to coupons，used the data with random coupon amount and real feedback. I tried adjust parameters (min_samples_leaf, min_samples_split, max_depth, n_estimators ... ), but it didn't work. Also I tried S-Learner ( just use coupon amount as one feature ), it seems better than CausalForestDML.

vsyrgkanis commented 3 years ago

Try setting a threshold on ‘min_var_fraction_leaf=.1’ and ‘min_var_leaf_on_val=True’

liukanglucky commented 3 years ago

Try setting a threshold on ‘min_var_fraction_leaf=.1’ and ‘min_var_leaf_on_val=True’

@vsyrgkanis I've try, but it doesn't work.

WEICHENGIT commented 3 years ago

Try setting a threshold on ‘min_var_fraction_leaf=.1’ and ‘min_var_leaf_on_val=True’

Hi,

We(@liukanglucky)'ve tried to tune the parameter _min_var_fractionleaf and other parameters to avoid overfitting, but sadly the DMLCF model was still heavily over-fitted on our data.

We doubt whether this is a common situation universally existing in causal models, or our data was ill collected. We noticed that in the use case of Causal Forest and Orthogonal Random Forest Examples.ipynb, real data has been applied. Anyone checked if the model was over-fitted on this dataset?

Thx.

jbel1026 commented 3 years ago

@WEICHENGIT I am also getting significant over-fitting when using multiple binary treatments

superpig99 commented 2 years ago

In my case, once the max_depth of CausalForest is more than 13, there will be obvious over fitting. Maybe modifying the depth will work.🤔️

vferraz commented 1 year ago

Is there still no solution for modeling multiple discrete treatments?

py-why / EconML

Multiple treatments when using grf.CausalForest #514