Open silulyu opened 5 days ago
One thing you can try is to use the tune
method on the forest before fitting, which should help you set appropriate hyperparameters. Also, you might want to do some model selection for your first stage models to ensure that you're getting the best possible first-stage fits.
However, in general you should expect the confidence intervals for forest-based methods to be wider than those for linear regression - the linear model is much more restrictive and therefore easier to estimate. But keep in mind that the confidence intervals are assuming that the assumptions of the model are met, which means that if the true data-generating process is not linear, then those tighter bounds are not necessarily correct!
Below is my code to estimate treatment effects. There is a much wider confidence interval of ATT (i.e., [-200k, 900k]) by Causal Forest DML model, compared to that calculated by linear DML model (i.e, [200k, 400k]). Are there any ways to make CI by Causal Forest DML narrower, and ideally statistically significant ?