kylebutts / didimputation

Difference-in-differences Imputation-based Estimator proposed by Borusyak, Jaravel, and Spiess (2021)
Other
46 stars 13 forks source link

did_imputation() fails if dependent variable is "y" #8

Closed grantmcdermott closed 2 years ago

grantmcdermott commented 2 years ago
library(didimputation)
#> Loading required package: fixest
#> Loading required package: data.table
data(df_het, package = "did2s")

did_imputation(
    yname = "dep_var", 
    data = df_het, gname = "g", tname = "year", idname = "unit"
    )
#> # A tibble: 1 × 6
#>   lhs     term  estimate std.error conf.low conf.high
#>   <chr>   <chr>    <dbl>     <dbl>    <dbl>     <dbl>
#> 1 dep_var treat     2.23    0.0183     2.19      2.27

## Channging dep_var to "y" triggers an error
df_het$y = df_het$dep_var
did_imputation(
    yname = "y", 
    data = df_het, gname = "g", tname = "year", idname = "unit"
    )
#> Error in .subset2(x, i, exact = exact): no such index at level 1

Created on 2022-09-03 with reprex v2.0.2

The offending code is here: https://github.com/kylebutts/didimputation/blob/main/R/did_imputation.R#L213-L223 (And a similar issue for lines 247--255 a bit further down.)

A quick and dirty solution is simply to use a more esoteric function argument than "y". Something like:

  ests <- yvars %>%
    purrr::set_names(yvars) %>%
    purrr::map(function(yy000) {  ## changed to yy000
      data[,
        zz000adj := .SD[[paste("zz000adj", yy000, sep = "_")]]
      ]...

As an aside, I'm not too sure I understand the need/value of using the leading purrr functions here. i think you'd be better off to just use data.table::set() with lapply and avoid the dependency requirement (+ NSE headache).

lapply(yvars, function (y) {
    set(data, j = "zz000adj", value = data[[paste("zz000adj", y, sep = "_")]])
}) 
arkokoley commented 1 year ago

@grantmcdermott you just saved my course grade!