Hello, I am a student currently working on a project that employs grf() package to estimate heterogeneous treatment effects. What I want to ask is whether using causal_forest() in diff-in-diff AND fixed effects setting is plausible.
Dataset in use is firm-level (unbalanced) panel data which has observations from 2009 to 2019 , where each year consists of at most 4 quaterly observations, for about 7500 distinct firms.
Response Y_it is continuous; Treatment W_it is binary; Covariates X_it include time-varying predictors. Treatment timing is the same for every treatment group (t=2018).
I happened to come across an application paper Causal forests with fixed effects for treatment effect heterogeneity in difference-in-differences (Kattenberg et al, 2023), and tried to use the estimator proposed in such paper. But error popped up while building package from the source (build_package.R) -- there seems to be missing separator issue in Makevars file.
Then I also found issue #310 which your answer suggests residualizing method when dealing with panel data (analogous to classical fixed effects), but the results were still non-sensical with large negative values for differential.forest.prediction() in test_calibration(). The results were similar even when I included firm-id dummies and/or time dummies(both year and quarter). Also, the results remained the same when I used yearly data instead of an annual one.
Though I am assuming that running causal_forest() with TWFE approach is not based on valid identification strategy, I am stuck on finding alternative approaches. What should I do in order to obtain desired ATE? To assess treatment heterogeneity, should I disregard individual firm-id fixed effect in the first place?
Along with that, it seems like there are cases where test_calibration() results largely vary on num.trees argument. What does this unstability imply about the validity of current approach?
Dear developers of grf:
Hello, I am a student currently working on a project that employs
grf()
package to estimate heterogeneous treatment effects. What I want to ask is whether usingcausal_forest()
in diff-in-diff AND fixed effects setting is plausible.Dataset in use is firm-level (unbalanced) panel data which has observations from 2009 to 2019 , where each year consists of at most 4 quaterly observations, for about 7500 distinct firms. Response Y_it is continuous; Treatment W_it is binary; Covariates X_it include time-varying predictors. Treatment timing is the same for every treatment group (t=2018).
I happened to come across an application paper Causal forests with fixed effects for treatment effect heterogeneity in difference-in-differences (Kattenberg et al, 2023), and tried to use the estimator proposed in such paper. But error popped up while building package from the source (build_package.R) -- there seems to be missing separator issue in Makevars file.
Then I also found issue #310 which your answer suggests residualizing method when dealing with panel data (analogous to classical fixed effects), but the results were still non-sensical with large negative values for
differential.forest.prediction()
intest_calibration()
. The results were similar even when I included firm-id dummies and/or time dummies(both year and quarter). Also, the results remained the same when I used yearly data instead of an annual one. Though I am assuming that runningcausal_forest()
with TWFE approach is not based on valid identification strategy, I am stuck on finding alternative approaches. What should I do in order to obtain desired ATE? To assess treatment heterogeneity, should I disregard individual firm-id fixed effect in the first place?Along with that, it seems like there are cases where
test_calibration()
results largely vary onnum.trees
argument. What does this unstability imply about the validity of current approach?Any advice would be a great help. Thank you.