Open WEICHENGIT opened 3 years ago
The same question. In grf.CausalForest T should be a matrix with size (n_samples, n_treatments) , if we have multiple discrete treatments how should the parameter T be set, and if we have multiple continuous treatments ?
@WEICHENGIT In terms of your question, if I understand it correctly, you want to estimate the treatment effect of coupon amount, then we could consider it as a continuous treatment and just input an array with this variable.
In addition, we always recommend you to use this CausalForest
as the final stage for CATE estimation and combine with Double Machine Learning framework when you want to learn HTE, it will residualize the outcome and treatment first and fit CausalForest
on residuals. You could use CausalForestDML
as below:
est = CausalForestDML(cv=2,
criterion='mse', n_estimators=400,
min_var_fraction_leaf=0.1,
min_var_leaf_on_val=True,
verbose=0, discrete_treatment=False,
n_jobs=-1, random_state=123)
@liukanglucky Similarly to the answer above, we always recommend you to use this CausalForest
as the final stage for CATE estimation and combine with Double Machine Learning framework when you want to learn HTE, it will residualize the outcome and treatment first and fit CausalForest
on residuals.
when you have multiple discrete treatment, just input the raw T and set discrete_treatment=True
when initiate the estimator, internally we will do one-hot-encoding for you; when you have multiple continuous treatments, then input the matrix with size (n_samples, n_treatments)
Here is the sample code:
# discrete treatment
est = CausalForestDML(discrete_treatment=True)
est.fit(Y, T, X, W) # T is the array of discrete treatment
# continuous treatment
est = CausalForestDML(discrete_treatment=False)
est.fit(Y, T, X, W) # T is matrix with size (n_samples, n_treatments)
@liukanglucky Similarly to the answer above, we always recommend you to use this
CausalForest
as the final stage for CATE estimation and combine with Double Machine Learning framework when you want to learn HTE, it will residualize the outcome and treatment first and fitCausalForest
on residuals.when you have multiple discrete treatment, just input the raw T and set
discrete_treatment=True
when initiate the estimator, internally we will do one-hot-encoding for you; when you have multiple continuous treatments, then input the matrix with size (n_samples, n_treatments)Here is the sample code:
# discrete treatment est = CausalForestDML(discrete_treatment=True) est.fit(Y, T, X, W) # T is the array of discrete treatment # continuous treatment est = CausalForestDML(discrete_treatment=False) est.fit(Y, T, X, W) # T is matrix with size (n_samples, n_treatments)
@heimengqi Thank you for your reply ! I have tried this method, but there will be serious over fitting. (Qini-Curve was be used to evaluate, the training set is much better than the test set. ) My task is to identify the user's sensitivity to coupons,used the data with random coupon amount and real feedback. I tried adjust parameters (min_samples_leaf, min_samples_split, max_depth, n_estimators ... ), but it didn't work. Also I tried S-Learner ( just use coupon amount as one feature ), it seems better than CausalForestDML.
Try setting a threshold on ‘min_var_fraction_leaf=.1’ and ‘min_var_leaf_on_val=True’
Try setting a threshold on ‘min_var_fraction_leaf=.1’ and ‘min_var_leaf_on_val=True’
@vsyrgkanis I've try, but it doesn't work.
Try setting a threshold on ‘min_var_fraction_leaf=.1’ and ‘min_var_leaf_on_val=True’
Hi,
We(@liukanglucky)'ve tried to tune the parameter _min_var_fractionleaf and other parameters to avoid overfitting, but sadly the DMLCF model was still heavily over-fitted on our data.
We doubt whether this is a common situation universally existing in causal models, or our data was ill collected. We noticed that in the use case of Causal Forest and Orthogonal Random Forest Examples.ipynb, real data has been applied. Anyone checked if the model was over-fitted on this dataset?
Thx.
@WEICHENGIT I am also getting significant over-fitting when using multiple binary treatments
In my case, once the max_depth of CausalForest is more than 13, there will be obvious over fitting. Maybe modifying the depth will work.🤔️
Is there still no solution for modeling multiple discrete treatments?
Hi,
This is more of a question concerning the grf module rather than an issue. We tried to use grf.CausalForest to estimate the heterogeneous causal effect with multiple treatments. In our case, the treatments are the coupons with different amount sent to customers, then should the parameter T be a one-hot encoding matrix, or just an array with the coupon amount?
Thx!