Open gcasamat opened 3 years ago
Interesting… we had done similar comparisons and wasnt seeing such big differences.
could you provide code reproducing these results?
Oh is this for the average effect??
make sure you use: ate__inference() Which is a doubly robust ate on the training data. That’s the comparable result.
ate_inference(Xtest) because it is on an arbitraty test set we can only provide for now conservative intervals as we also point in the asterisk in the summary frame. So these would be much wider
You could also just call “summary()” on the cfdml object and you’ll see these results (assuming your treatment is categorical)
Thank you for your reply. I have omitted to mention that my treatment is continuous.
Here is the code:
est_forest = CausalForestDML(model_y = RegressionForest(random_state = 123), model_t = RegressionForest(random_state = 123), random_state = 123)
est_forest.fit(Y, T, X = W, W = W, inference = 'auto')
est_forest.marginal_ate_inference(T, X=W)
Unfortunately then you'll need to wait a few weeks for the doubly robust correction to be implemented for continuous treatments. That correction requires yet one more ML model to be provided by the user, which was why we were hesitant to add that in the first round, but we've found other users/use cases where this is important so we'll add it soon.
Currently you are getting the conservative intervals by using ate_inference(Xtest), which is why you see the difference on the average effects.
For now if you want a "non-conservative" (the other direction) interval, you could look at the "std_point" on the second table of the ate_inference results and divide that by the sqrt{number of samples} and then create an 95%-confidence intervals via:
lb = scipy.stats.norm.ppf(.025, loc=mean_point, scale = std_point / sqrt{number of samples})
ub = scipy.stats.norm.ppf(1 - .025, loc=mean_point, scale = std_point / sqrt{number of samples})
but this could potentially by too optimistic. Though I suspect this will be much closer to what the grf package provides.
Thanks a lot for the advice! I will try this and also the new econml version with the doubly robust correction.
I trained the same model, both with the econml CausalForestDML and the R grf causal_forest algorithms. The average marginal effects obtained are comparable, around 0.12. The confidence intervals are however quite different: [0.064,0.166] with causal_forest, and [-0.360,0.621] with CausalForestDML. Do you have an idea of why there is such a difference? Thanks