Problem with newest version with event studies

michaeltopper1 commented 1 year ago

Hey Kyle,

Great work with the package, it's one of my go-tos for robust TWFE estimation!

I did however, find an issue with the development version of the package when estimating an event study.

Below is your example from your README that fails to run:

library(fixest)
library(did2s)

# Event Study
es <- did2s(df_het,
  yname = "dep_var", first_stage = ~ 0 | state + year,
  second_stage = ~ i(rel_year, ref = c(-1, Inf)), treatment = "treat",
  cluster_var = "state"
)
#> Running Two-stage Difference-in-Differences
#>  - first stage formula `~ 0 | state + year`
#>  - second stage formula `~ i(rel_year, ref = c(-1, Inf))`
#>  - The indicator variable that denotes when treatment is on is `treat`
#>  - Standard errors will be clustered by `state`

I keep switching back and forth between the CRAN version and this development version. Looks like in the CRAN version 1.1.0, the event studies work great. However, in the development version, these event studies crash.

Why not just switch back to the CRAN version ? I would, except the newest development version is the only version that will work with my large dataset (~4 mill observations) when including many FEs.

Not sure if this is a quick fix or not, but hope it's something super simple!

kylebutts commented 1 year ago

Could you show me an example of what error you're getting so I can troubleshoot? To be clear, I don't reproduce an error locally

michaeltopper1 commented 1 year ago

Certainly! I naively thought the error would be seen on your end. Here's a reprex:

library(fixest)
library(did2s)
#> did2s (v1.1.0). For more information on the methodology, visit <https://www.kylebutts.github.io/did2s>
#> 
#> To cite did2s in publications use:
#> 
#>   Butts & Gardner, "The R Journal: did2s: Two-Stage
#>   Difference-in-Differences", The R Journal, 2022
#> 
#> A BibTeX entry for LaTeX users is
#> 
#>   @Manual{,
#>     title = {did2s: Two-Stage Difference-in-Differences Following Gardner (2021)},
#>     author = {Kyle Butts and John Gardner},
#>     year = {2021},
#>     url = {https://journal.r-project.org/articles/RJ-2022-048/},
#>   }

es <- did2s(df_het,
            yname = "dep_var", first_stage = ~ 0 | state + year,
            second_stage = ~ i(rel_year, ref = c(-1, Inf)), treatment = "treat",
            cluster_var = "state"
)
#> Running Two-stage Difference-in-Differences
#>  - first stage formula `~ 0 | state + year`
#>  - second stage formula `~ i(rel_year, ref = c(-1, Inf))`
#>  - The indicator variable that denotes when treatment is on is `treat`
#>  - Standard errors will be clustered by `state`
#> Error: in summary.fixest(est$second_stage, .vcov ...:
#>  Argument '.vcov' must be either: i) a matrix, or ii) a function.
#> Problem: it is of length 0, while it should have a positive length.

^{Created on 2023-08-18 with reprex v2.0.2}

Session info

``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.3.1 (2023-06-16) #> os macOS Ventura 13.4.1 #> system aarch64, darwin20 #> ui X11 #> language (EN) #> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz America/Los_Angeles #> date 2023-08-18 #> pandoc 3.1.1 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/ (via rmarkdown) #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date (UTC) lib source #> cli 3.6.1 2023-03-23 [1] CRAN (R 4.3.0) #> data.table 1.14.8 2023-02-17 [1] CRAN (R 4.3.0) #> did2s * 1.1.0 2023-08-18 [1] Github (kylebutts/did2s@910e3ec) #> digest 0.6.33 2023-07-07 [1] CRAN (R 4.3.0) #> dreamerr 1.2.3 2020-12-05 [1] CRAN (R 4.3.0) #> evaluate 0.21 2023-05-05 [1] CRAN (R 4.3.0) #> fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.0) #> fixest * 0.11.1 2023-01-10 [1] CRAN (R 4.3.0) #> Formula 1.2-5 2023-02-24 [1] CRAN (R 4.3.0) #> fs 1.6.3 2023-07-20 [1] CRAN (R 4.3.0) #> glue 1.6.2 2022-02-24 [1] CRAN (R 4.3.0) #> htmltools 0.5.6 2023-08-10 [1] CRAN (R 4.3.0) #> knitr 1.43 2023-05-25 [1] CRAN (R 4.3.0) #> lattice 0.21-8 2023-04-05 [1] CRAN (R 4.3.1) #> lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.3.0) #> Matrix 1.6-1 2023-08-14 [1] CRAN (R 4.3.0) #> nlme 3.1-163 2023-08-09 [1] CRAN (R 4.3.0) #> numDeriv 2016.8-1.1 2019-06-06 [1] CRAN (R 4.3.0) #> Rcpp 1.0.11 2023-07-06 [1] CRAN (R 4.3.0) #> reprex 2.0.2 2022-08-17 [1] CRAN (R 4.3.0) #> rlang 1.1.1 2023-04-28 [1] CRAN (R 4.3.0) #> rmarkdown 2.24 2023-08-14 [1] CRAN (R 4.3.0) #> rstudioapi 0.15.0 2023-07-07 [1] CRAN (R 4.3.0) #> sandwich 3.0-2 2022-06-15 [1] CRAN (R 4.3.0) #> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.0) #> withr 2.5.0 2022-03-03 [1] CRAN (R 4.3.0) #> xfun 0.40 2023-08-09 [1] CRAN (R 4.3.0) #> yaml 2.3.7 2023-01-23 [1] CRAN (R 4.3.0) #> zoo 1.8-12 2023-04-13 [1] CRAN (R 4.3.0) #> #> [1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library #> #> ────────────────────────────────────────────────────────────────────────────── ```

kylebutts commented 1 year ago

Should be fixed by 31b26cd. Let me know if it doesn't work for you. I'm in the process of adding sparse_model_matrix support into fixest so I can use it across my packages. I manually copied that to this package but didn't fix a bug.

As an FYI, almost always a bug will be found in the vcov creation part of the code. You can, in the short-term use did2s:::did2s_estimate() to get point estimates with incorrect standard errors. It will be very fast, so I tend to use that with big data when I'm trying to make sure my code/data is correct.

kylebutts / did2s

Problem with newest version with event studies #26