Closed maxikagan closed 5 months ago
Thank you for reporting the issue.
I was able to reproduce the issue. As we can see in the code chunk below, the CI for DID is not appropriate (LB > UB):
> set.seed(1234)
> fit_panel <- did(
+ formula = lnavgsalary_cpi ~ oncycle | teachers_avg_yrs_exper +
+ ami_pc + asian_pc + black_pc + hisp_pc,
+ data = anzia2012,
+ id_unit = "district",
+ id_time = "year",
+ design = "did",
+ is_panel = TRUE,
+ option = list(n_boot = 20, parallel = TRUE, lead = 0, se_boot = FALSE)
+ )
> summary(fit_panel)
── ATT Estimates ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
estimator lead estimate std.error statistic p_value ci.low ci.high
1 Double-DID 0 -0.0065 0.0028 -2.3 0.019 -0.01192 -0.0011
2 DID 0 -0.0062 0.0028 -2.2 0.027 -0.01169 -0.0124
3 sDID 0 -0.0044 0.0041 -1.1 0.282 -0.00071 0.0036
There are two major bugs associated with the incorrect reporting of confidence intervals.
The CI estimates for DID and sDID are organized in the column-wise format: https://github.com/naoki-egami/DIDdesign/blob/b09fa0fc5df32a4f43977fca9eb3f06705bde23d/R/did-std.R#L230-L231
However, the output is parsed row-wise: https://github.com/naoki-egami/DIDdesign/blob/b09fa0fc5df32a4f43977fca9eb3f06705bde23d/R/did-std.R#L240-L241
This results in the inccorect CI reporting. Double-DID estimates are not affected by this bug.
The bootstrap option se_boot = TRUE
was not functioning. This is due to the change introduced in dac836a where se_option_gmm
(introduced in 25c5f20) was replaced with se_option
, while the argument passed to the estimation function remained unchanged: https://github.com/naoki-egami/DIDdesign/blob/b09fa0fc5df32a4f43977fca9eb3f06705bde23d/R/did-std.R#L60.
Since the output from the bootstrap is organized row-wise, it created an inconsistency resulted in the issue 1.
Before the fix, flipping se_boot
from FALSE
to TRUE
does not affect the reported CIs. (Also CIs for DID and sDID were incorrect).
> set.seed(1234)
> fit_panel <- did(
+ formula = lnavgsalary_cpi ~ oncycle | teachers_avg_yrs_exper +
+ ami_pc + asian_pc + black_pc + hisp_pc,
+ data = anzia2012,
+ id_unit = "district",
+ id_time = "year",
+ design = "did",
+ is_panel = TRUE,
+ option = list(n_boot = 20, parallel = TRUE, lead = 0, se_boot = TRUE)
+ )
> summary(fit_panel)
── ATT Estimates ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
estimator lead estimate std.error statistic p_value ci.low ci.high
1 Double-DID 0 -0.0065 0.0028 -2.3 0.019 -0.01192 -0.0011
2 DID 0 -0.0062 0.0028 -2.2 0.027 -0.01169 -0.0124
3 sDID 0 -0.0044 0.0041 -1.1 0.282 -0.00071 0.0036
After the fix, CIs are different based on se_boot
option and the CIs are correctly reported.
> fit_panel <- did(
+ formula = lnavgsalary_cpi ~ oncycle | teachers_avg_yrs_exper +
+ ami_pc + asian_pc + black_pc + hisp_pc,
+ data = anzia2012,
+ id_unit = "district",
+ id_time = "year",
+ design = "did",
+ is_panel = TRUE,
+ option = list(n_boot = 20, parallel = TRUE, lead = 0, se_boot = FALSE)
+ )
> summary(fit_panel)
── ATT Estimates ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
estimator lead estimate std.error statistic p_value ci.low ci.high
1 Double-DID 0 -0.0065 0.0028 -2.3 0.019 -0.012 -0.00108
2 DID 0 -0.0062 0.0028 -2.2 0.027 -0.012 -0.00071
3 sDID 0 -0.0044 0.0041 -1.1 0.282 -0.012 0.00361
> set.seed(1234)
> fit_panel <- did(
+ formula = lnavgsalary_cpi ~ oncycle | teachers_avg_yrs_exper +
+ ami_pc + asian_pc + black_pc + hisp_pc,
+ data = anzia2012,
+ id_unit = "district",
+ id_time = "year",
+ design = "did",
+ is_panel = TRUE,
+ option = list(n_boot = 20, parallel = TRUE, lead = 0, se_boot = TRUE)
+ )
> summary(fit_panel)
── ATT Estimates ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
estimator lead estimate std.error statistic p_value ci.low ci.high
1 Double-DID 0 -0.0065 0.0028 -2.3 0.019 -0.010 -0.00108
2 DID 0 -0.0062 0.0028 -2.2 0.027 -0.010 -0.00055
3 sDID 0 -0.0044 0.0041 -1.1 0.282 -0.011 0.00323
Tested the updated code by building the readme file. The output is in 52c0c40.
When calling summary() on a DIDdesign (std), the confidence intervals appear flipped between DID and sDID.
The Double-DID confidence intervals appear correct, but for each lead level of DID and sDID these two appear flipped. If I manually calculate the CIs based upon the estimate and std.error generated by the summary() call then it appears that these are mismatched.