grantmcdermott / etwfe

Extended two-way fixed effects
https://grantmcdermott.com/etwfe/
Other
50 stars 11 forks source link

Throws error with more than one covariate #7

Closed fhollenbach closed 1 year ago

fhollenbach commented 1 year ago

First off, thank you for creating the package.

At the moment, more than one covariate leads to an error. The problem is caused when creating the crtls_dm variable in line 109 in etwfe.R, the string has to be split to get the names right, e.g., ctrls_dm = paste0([ctrls](stringr::str_replace_all(stringr::str_split(ctrls, "\+")[[1]], " ", "")), "_dm")

I have not had time to check whether this can be fixed here or would lead to downstream problems. Will try to take a look tomorrow.

grantmcdermott commented 1 year ago

Super, thanks. That makes sense. I won't have time to look at this immediately, so please feel free to put in a PR if you can. An additional test for multiple control vars (see inst/tinytest) would also be much appreciated.

grantmcdermott commented 1 year ago

I had a quick crack at fixing this here: https://github.com/grantmcdermott/etwfe/tree/multi_rhs

Do you mind installing this multi_rhs branch and trying with your dataset? Even better is if you are able to compare to jwdid. I don't have access to Stata on this computer, but I'd like to verify that etwfe and jwdid are producing the same result.

fhollenbach commented 1 year ago

Thanks so much! I was also working on a fix and did the same things as you, except I hadn't figured out how to change the line for the varying slopes yet.

I ran it on my simulated data and it looks pretty good, however, the SEs are slightly different depending on the type of estimation. The covariates x1 and x2 are time-invariant. Here are the results after aggregation: fe = 'vs'

     Term    Contrast .Dtreat Effect Std. Error z value   Pr(>|z|)  2.5 % 97.5 %
1 .Dtreat mean(dY/dX)       1  1.019    0.09118   11.17 < 2.22e-16 0.8398  1.197

fe = 'feo'

     Term    Contrast .Dtreat Effect Std. Error z value   Pr(>|z|)  2.5 % 97.5 %
1 .Dtreat mean(dY/dX)       1  1.019    0.08726   11.67 < 2.22e-16 0.8475   1.19

fe = 'none'

Term    Contrast .Dtreat Effect Std. Error z value   Pr(>|z|)  2.5 % 97.5 %
1 .Dtreat mean(dY/dX)       1  1.019    0.08739   11.65 < 2.22e-16 -6.421 -4.296

Stata:

jwdid  y x1 x2, ivar(id) tvar(time) gvar(first_treated)
estat simple
          |   Contrast   std. err.     [95% conf. interval]
      _at |
(2 vs 1)  |   1.018502   .0872584      .8474786    1.189525.

For comparison, here is the aggregated ATE from the did package:

type estimate std.error  conf.low conf.high point.conf.low point.conf.high
1 simple 0.990916 0.1895209 0.6194619   1.36237      0.6194619         1.36237

I would like to test this some more but I have a deadline coming up. Hopefully, I can do more over the weekend/next week. Thanks again for all your work on this.

grantmcdermott commented 1 year ago

My guess is that this is a degrees of freedom issue, as raised by Jeff here: https://twitter.com/jmwooldridge/status/1582355404129914880?s=61&t=1RGrPLJUB1l_gxm1sgCfgQ

I’m not sure how to mimic the uncon option for marginaleffects TBH. It’s something I’ll probably have to follow up with Vincent.

But otherwise, good to know that the main mfx are the same. Can you confirm that they coefficients are also equivalent in the main models? If so, then I’ll merge the multi_rhs branch and look deeper into the SEs later (since it’s a separate issue).

fhollenbach commented 1 year ago

Hi,

Sorry for the delay. Comparing the coefficients from etwfe feo model and the stata model, all the coefficients are the same when rounding to the 4th digit. This is with time-constant controls.

Though when I simulate time-varying covariates, the results vary across the three estimation methods and compared to the Stata. The Stata helpfile says only time-constant controls are allowed, but shouldn't this also work for time-varying controls (according to Wooldridge)?

grantmcdermott commented 1 year ago

Great. I'll merge the PR then.

RE: time-varying covariates. IIRC the Wooldridge result specifically and only addresses time-constant covariates. TBH it's been a while since I read the actual paper at this point. I'll double check and add a note cautioning about time-varying coefs. (UPDATE: See p. 17 of the working paper.)