Open dori21 opened 1 year ago
You should set discrete_treatment=True
in this case, though it may not matter much in practice. When discrete treatment is specified, we one-hot-encode the treatment and then drop the first column, which doesn't matter in this case because your treatment values are 0 and 1 (but would matter if they were 'a' and 'b' or something). We also call the predict_proba
method on the T model that you specify, rather than predict
when computing our first stage residuals - this should generally result in slightly better final stage models because otherwise the treatment residuals will be limited to the discrete set {-1,0,1} (depending on if T-T_pred
is 0-1; 1-1 or 0-1; or 1-0), rather than using the finer grained probabilities the classifier learned (which can result in residuals in the entire range [-1,1]).
In the case of Binary treatment[1 for treatment group 0 for control group] and Continuous outcome,
CASE1 : discrete_treatment=True
est = CausalForestDML(criterion='het')
set parameters for causal forest
est = CausalForestDML(criterion='het', random_state=1, discrete_treatment=True, honest=True, inference=True, cv=2, model_t=LogisticRegressionCV(), model_y=Lasso() )
CASE2 : discrete_treatment=False est = CausalForestDML(criterion='het', random_state=1, discrete_treatment=False, honest=True, inference=True, cv=2, model_t=Lasso(), model_y=Lasso() )
CASE1 and CASE 2 basically work same function ? In this case, which one is more fir between CASE 1 or CASE 2 ?
I wonder discrete_treatment=TRUE is applies for only multiple treatment not binary treatment.