kylebutts / did2s_stata

Two-Stage Difference-in-Differences following Gardner (2021)
30 stars 14 forks source link

Allow absorbed fixed effects #1

Closed korenmiklos closed 2 years ago

korenmiklos commented 3 years ago

I see that this is the only place where you need the actual variables here to compute the standard error

https://github.com/kylebutts/did2s_stata/blob/af5cd92bc73f351bfa14de22d48ca068b73cb5ef/ado/did2s.ado#L176

Let me think if there is a way to calculate inv(X'X) if the Xs are absorbed fixed effects. I am pretty sure we don't need to create, store and compute with the individual dummies.

korenmiklos commented 3 years ago

If X contains only dummies, X'X is easy to compute. X_nn = T_n, the number of observations by group. X_nt = 1 if group n is observed at time t, 0 otherwise. I can implement this in Mata.

kylebutts commented 3 years ago

We could make an option that says fixed effects only? Be careful though because X_1 can’t include the omitted dummies

korenmiklos commented 3 years ago

X1 is also needed on L187, but this can probably be calculated as groups sums of u.

kylebutts commented 3 years ago

I think there should be an option that says “first stage” is just fixed effects. If this is so, all of these become low memory operations and can be completed as you described. The problem is when there are covariates included in first stage then the matrix algebra is more complicated.

jonathan-norris commented 2 years ago

I am currently using did2s and had a similar concern. We want to follow Callaway & Sant'Anna's suggestion to not rely on time-varying controls since these could be affected by the treatment but rather to use pre-treatment base period fixed controls. Woolridge does this in a recent paper extending TWFE. This means adding, for example, i.unit i.unit#i.xi or i.unit#c.xi. Similarly to add a linear group trend we might want i.unit#c.year. In our case, we have panel data on individuals. In some cases, we would like to include individual fixed effects, thus the ability to absorb the individual fixed effects and specify the first stage would be handy. Alternatively, absorbing the interactions between unit FEs and pre-treatment characteristics might be helpful too. I say this to suggest that that fixed effects only option idea doesn't seem helpful. Is it feasible to build a hybrid? I haven't thought about that but seems it should be.

basquith86 commented 2 years ago

I was wondering whether any progress had been made on this front. It's easy enough for me to go into the ado file and add a section that uses reghdfe in the first stage instead of reg, but I know that that's not going to give the right standard errors because it'll change the DOF.

kylebutts commented 2 years ago

Hi @basquith86 (Brian Asquinth, ya?) The primary problem is that reghdfe does not allow the creation of a model matrix including the fixed effects. So I haven't figured out how to do it in Stata. There are a few options:

  1. While debugging code / exploring, manually perform the two-stage estimation to see the point estimates (inflate standard errors a little). Then, when you want correct standard errors, you can bootstrap the standard errors like this using reghdfe: https://github.com/kylebutts/did2s_stata#large-datasets-or-many-fixed-effects

  2. The R version of the code, https://github.com/kylebutts/did2s, does use fast fixed effect estimation so if you are multi-lingual, that's a solution! I will check out and try to fix the pweights problem in #10, but that also works well in the R version :-)

basquith86 commented 2 years ago

Hi Kyle,

Thanks for your response. I actually partially figured out what was going on in #2 myself, or what I think is the problem. In the mata code, there's a set off where if the weight string isn't missing, the code has something like first_u=sqrt(weights)*:first_u. Except sqrt() doesn't work on vector columns in MATA. I tried resolving this by creating a temp variable in the main code with the square root of the weights, passing in the the temp variable into MATA, and then correcting the set off code so that it just does the element-wise multiplication with the pre-treated variable. That also didn't seem to work, but I haven't had time yet today to figure out why.

Brian

On Tue, Jun 28, 2022 at 4:54 PM Kyle F Butts @.***> wrote:

Hi @basquith86 https://github.com/basquith86 (Brian Asquinth, ya?) The primary problem is that reghdfe does not allow the creation of a model matrix including the fixed effects. So I haven't figured out how to do it in Stata. There are a few options:

1.

While debugging code / exploring, manually perform the two-stage estimation to see the point estimates (inflate standard errors a little). Then, when you want correct standard errors, you can bootstrap the standard errors like this using reghdfe: https://github.com/kylebutts/did2s_stata#large-datasets-or-many-fixed-effects 2.

The R version of the code, https://github.com/kylebutts/did2s, does use fast fixed effect estimation so if you are multi-lingual, that's a solution! I will check out and try to fix the pweights problem in #10 https://github.com/kylebutts/did2s_stata/issues/10, but that also works well in the R version :-)

— Reply to this email directly, view it on GitHub https://github.com/kylebutts/did2s_stata/issues/1#issuecomment-1169233163, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHT5GEOEWD6G7ISBURSQ4QTVRNRB5ANCNFSM46BR22KA . You are receiving this because you were mentioned.Message ID: @.***>

-- Brian J. Asquith www.brianjamesasquith.com

kylebutts commented 2 years ago

@korenmiklos If you're still interested, I think I figured out how to use reghdfe with did2s_stata, but not confident enough in my Stata coding skill to implement. We can use the trick from the end of this to predict out of sample, solving the problem I was having https://github.com/sergiocorreia/reghdfe/issues/17

kylebutts commented 2 years ago

Hi @korenmiklos @basquith86,

John Gardner emailed me and showed me a new computational trick he figured out. You can actually FWL the 'unit' fixed effects from the first_stage variables and the outcome variable before running the did2s call, and it will actually result in identical standard errors if you cluster at the unit level (or something that nests the units, e.g. states).

I implemented it in this commit e7cf2e6, so this issue is FINALLY closed. I tested it with the data you sent me @basquith86 and it's a major speedup

basquith86 commented 2 years ago

Excellent, thanks for your work on this, Kyle!

On Thu, Jun 30, 2022 at 7:49 PM Kyle F Butts @.***> wrote:

Hi @korenmiklos https://github.com/korenmiklos @basquith86 https://github.com/basquith86,

John Gardner emailed me and showed me a new computational trick he figured out. You can actually FWL the 'unit' fixed effects from the first_stage variables and the outcome variable before running the did2s call, and it will actually result in identical standard errors if you cluster at the unit level (or something that nests the units, e.g. states).

I implemented it in this commit e7cf2e6 https://github.com/kylebutts/did2s_stata/commit/e7cf2e6133ac071870fe15c7f87c1a7091069efe, so this issue is FINALLY closed. I tested it with the data you sent me @basquith86 https://github.com/basquith86 and it's a major speedup

— Reply to this email directly, view it on GitHub https://github.com/kylebutts/did2s_stata/issues/1#issuecomment-1171777367, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHT5GEP2BXNSFFXE3X24FHTVRYW63ANCNFSM46BR22KA . You are receiving this because you were mentioned.Message ID: @.***>

-- Brian J. Asquith www.brianjamesasquith.com