Open RamirezAmayaS opened 1 year ago
The way I read that algorithm is that they use CF purely as an exploratory tool to see if there is treatment effect heterogeneity along “some” observable covariates. If there are many covariates, then doing what they do is just a heuristic to narrow down that set of “some”. What would be problematic would be to do something like that with W.hat and Y.hat if you are not in an RCT: recall CFs can be thought of as a two-step estimator: adjust for confounding by estimating W.hat and Y.hat for orthogonalization, then with these residuals, try to detect HTE along some observable covariates of your choice.
I am revisiting the analysis of Athey and Wager (2019) (experiments/acic18).
In Algorithm 1, why are the pilot forest and the final causal forest trained over the full data? Isn't it relevant for the generalizability of the final predictions that the final causal forest is trained on data that was not used for feature selection in the pilot forest?