grantmcdermott / etwfe

Extended two-way fixed effects
https://grantmcdermott.com/etwfe/
Other
50 stars 11 forks source link

emfx() not aggregating when the data already includes a variable named 'group' #41

Closed mariofiorini closed 6 months ago

mariofiorini commented 1 year ago

Hi @grantmcdermott, just noticed a possible bug? If the data being used already has a variable named "group" then the emfx() function does not seem to aggregate the effects. In the example below, I first run the same example that you have on your website, so emfx() works as expected, and then modify the data adding a variable named group. Note that this new variable is not used in the etwfe() function, and yet emfx() does not aggregate the results. Cheers, Mario

data("mpdta", package = "did")
head(mpdta)
mod =
  etwfe::etwfe(
    fml  = lemp ~ lpop, # outcome ~ controls
    tvar = year,        # time variable
    gvar = first.treat, # group variable
    data = mpdta,       # dataset
    vcov = ~countyreal  # vcov adjustment (here: clustered)
  )
etwfe::emfx(mod)

    _Term                 Contrast .Dtreat Estimate Std. Error     z Pr(>|z|)    S   2.5 %  97.5 %
 .Dtreat mean(TRUE) - mean(FALSE)    TRUE  -0.0506     0.0125 -4.05   <0.001 14.3 -0.0751 -0.0261_

mpdta$group <- mpdta$countyreal
mod2 =
  etwfe::etwfe(
    fml  = lemp ~ lpop, # outcome ~ controls
    tvar = year,        # time variable
    gvar = first.treat, # group variable
    data = mpdta,       # dataset
    vcov = ~countyreal  # vcov adjustment (here: clustered)
  )
etwfe::emfx(mod2)

 _Group    Term                 Contrast .Dtreat Estimate Std. Error       z Pr(>|z|)   S   2.5 %    97.5 %
  8001 .Dtreat mean(TRUE) - mean(FALSE)    TRUE -0.09431     0.0330 -2.8579  0.00426 7.9 -0.1590 -0.029634
  8019 .Dtreat mean(TRUE) - mean(FALSE)    TRUE -0.02163     0.0334 -0.6473  0.51746 1.0 -0.0871  0.043870
  8023 .Dtreat mean(TRUE) - mean(FALSE)    TRUE -0.00310     0.0474 -0.0654  0.94785 0.1 -0.0961  0.089886
  8029 .Dtreat mean(TRUE) - mean(FALSE)    TRUE -0.04333     0.0193 -2.2501  0.02444 5.4 -0.0811 -0.005587
  8041 .Dtreat mean(TRUE) - mean(FALSE)    TRUE -0.10128     0.0382 -2.6539  0.00796 7.0 -0.1761 -0.026483
--- 181 rows omitted. See ?print.marginaleffects --- 
 55103 .Dtreat mean(TRUE) - mean(FALSE)    TRUE -0.05469     0.0224 -2.4459  0.01445 6.1 -0.0985 -0.010865
 55109 .Dtreat mean(TRUE) - mean(FALSE)    TRUE -0.00621     0.0205 -0.3027  0.76208 0.4 -0.0464  0.033973
 55123 .Dtreat mean(TRUE) - mean(FALSE)    TRUE -0.03744     0.0191 -1.9587  0.05015 4.3 -0.0749  0.000024
 55131 .Dtreat mean(TRUE) - mean(FALSE)    TRUE  0.01769     0.0269  0.6585  0.51024 1.0 -0.0350  0.070355
 55137 .Dtreat mean(TRUE) - mean(FALSE)    TRUE -0.04484     0.0202 -2.2177  0.02658 5.2 -0.0845 -0.005210
Columns: group, term, contrast, .Dtreat, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted, predicted_hi, predicted_lo_ 
grantmcdermott commented 1 year ago

Thanks for reporting @mariofiorini.

IIRC this is an upstream limitation of marginaleffects. For example, see @vincentarelbundock's comment here about "group" being a reserved word. Vincent, is this still the case? Either way, I can a t least document it better.

vincentarelbundock commented 1 year ago

Yes, upstream does have that limitation and I am afraid it will not be relaxed.

grantmcdermott commented 6 months ago

Sorry for the long delay on this, but it should be fixed if you grab the dev version now. Specifically, the fix:

Illustration of the former case:

library(etwfe)
data("mpdta", package = "did")

mod =
  etwfe::etwfe(
    fml  = lemp ~ lpop, # outcome ~ controls
    tvar = year,        # time variable
    gvar = first.treat, # group variable
    data = mpdta,       # dataset
    vcov = ~countyreal  # vcov adjustment (here: clustered)
  )
etwfe::emfx(mod)
#> 
#>     Term                 Contrast .Dtreat Estimate Std. Error     z Pr(>|z|)
#>  .Dtreat mean(TRUE) - mean(FALSE)    TRUE  -0.0506     0.0125 -4.05   <0.001
#>     S   2.5 %  97.5 %
#>  14.3 -0.0751 -0.0261
#> 
#> Columns: term, contrast, .Dtreat, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted 
#> Type:  response

mpdta$group <- mpdta$countyreal
mod2 =
  etwfe::etwfe(
    fml  = lemp ~ lpop, # outcome ~ controls
    tvar = year,        # time variable
    gvar = first.treat, # group variable
    data = mpdta,       # dataset
    vcov = ~countyreal  # vcov adjustment (here: clustered)
  )
etwfe::emfx(mod2)
#> 
#>     Term                 Contrast .Dtreat Estimate Std. Error     z Pr(>|z|)
#>  .Dtreat mean(TRUE) - mean(FALSE)    TRUE  -0.0506     0.0125 -4.05   <0.001
#>     S   2.5 %  97.5 %
#>  14.3 -0.0751 -0.0261
#> 
#> Columns: term, contrast, .Dtreat, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted 
#> Type:  response

Created on 2024-02-23 with reprex v2.1.0

mariofiorini commented 6 months ago

ok thanks