Open olidess opened 8 years ago
Hi Oli,
Both syntax A and B should give the same answer for the betas (e.g. if you run "reghdfe y x, a(..)", then the estimate for x should be unchanged). However, there is no guarantee that the individual intercepts and slopes (the alphas) will be the same between syntaxes. The reason for this is that often there can't actually be recovered, as the parameters are not identified.
The working assumption that reghdfe does for the intercepts is that it returns a variable hdfe1 with mean zero. There is some discussion here: http://scorreia.com/software/reghdfe/faq.html#where-is-the-constant.
For a simple example of why we do this ,think of a regression like "regress y x, a(fe1 fe2)". In this case, the two sets of fixed effects are collinear. The usual solution is to drop some of the dummies, but we can't do that because we are demeaning. The areg solution is to add back the constant, but since I can have more than one set of FEs, then the question would be which of the two FEs receives the constant.
All in all, I am including the option to save the FEs because many people seem to use it, but there are a lot of nuances in how to use and interpret them (although your case is quite straightforward).
Perhaps I should allow an option to give the same alphas as regress in a case like yours?
Hi Sergio,
Thank you for your answer (And thank you for the reghdfe command as well ☺, it is great … I love it !!).
I see the problem. But I still find a bit confusing here that the two syntaxes yield exactly the same betas, the same individual slopes (individual slopes are really identical) but very different individual intercepts. Note also that the two syntaxes also lead to the same residuals when using the option residuals as an alternative to savefe. If the residuals, the betas, and the individual slops are the same, then by construction, I would expect the individual intercepts to be the same as well.
Let me elaborate more on this issue. Suppose you want to run an event study with many events and that expected stock returns are generated by the following model:
R = Alpha + Beta1 x Var1 + Beta2 x Var2 + Beta3 x Var3
The traditional approach to compute abnormal returns is to estimate the parameters alpha, beta1, beta2, beta3 by running separate regressions for every event. In stata, this can be done with the following command
Statsby _b, by(event) : reg R Var1 Var2 Var3
But if you have many events (more than 10000 as is the case for me), this takes forever!
With your command, we can directly estimate all parameters (Alpha, Beta1, Beta2, Beta3) for every event by running
reghdfe R , a( i.event##c.( Var1 Var2 Var3), savefe)
Beta1 for every event is recorded in _hdfe1_slop1 , Beta 2 is in _hdfe1_slop2, Beta 3 is in _hdfe1_slop3. I have compared the two approaches and the Betas are indeed the same. The only problem is that what is recorded in __hdfe1 (i.e. the individual intercepts for every event) are not the Alpha estimates using the traditional approach
Note that this is solved by using the alternative syntax
reghdfe R , a( i.event i.event ##c.Var1 i.event ##Var2 i.event ##Var3, savefe)
In this case, the individual intercepts for every event recorded in __hdfe1 correspond to the Alpha estimates using the traditional approach.
The problem is that this alternative syntax is not as fast as the first one. The first one is much much faster (it takes 10 secondes).
One way to get the correct Alpha using the first syntax is to do
reghdfe R , a( i.event##c.( Var1 Var2 Var3), savefe) reghdfe R , a( i.event##c.( Var1 Var2 Var3)) residuals(resid)
The correct Alpha is obtained by doing : R-Resid- Beta1* Var1- Beta2* Var2- Beta3* Var3
I see the problem you have when there are multiple FEs, but is there a way to get the same individual intercepts estimates across all syntaxes when there is only one set of FE?
Thank you so much !
Olivier
PS: Again, your command is great. I really like it.
From: Sergio Correia [mailto:notifications@github.com] Sent: May-26-16 2:07 AM To: sergiocorreia/reghdfe reghdfe@noreply.github.com Cc: Olivier Dessaint Olivier.Dessaint@Rotman.Utoronto.Ca; Author author@noreply.github.com Subject: Re: [sergiocorreia/reghdfe] Bug? (#52)
Hi Oli,
Both syntax A and B should give the same answer for the betas (e.g. if you run "reghdfe y x, a(..)", then the estimate for x should be unchanged). However, there is no guarantee that the individual intercepts and slopes (the alphas) will be the same between syntaxes. The reason for this is that often there can't actually be recovered, as the parameters are not identified.
The working assumption that reghdfe does for the intercepts is that it returns a variable hdfe1 with mean zero. There is some discussion here: http://scorreia.com/software/reghdfe/faq.html#where-is-the-constant.
For a simple example of why we do this ,think of a regression like "regress y x, a(fe1 fe2)". In this case, the two sets of fixed effects are collinear. The usual solution is to drop some of the dummies, but we can't do that because we are demeaning. The areg solution is to add back the constant, but since I can have more than one set of FEs, then the question would be which of the two FEs receives the constant.
All in all, I am including the option to save the FEs because many people seem to use it, but there are a lot of nuances in how to use and interpret them (although your case is quite straightforward).
Perhaps I should allow an option to give the same alphas as regress in a case like yours?
— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHubhttps://github.com/sergiocorreia/reghdfe/issues/52#issuecomment-221785620
I agree that it is confusing. All in all, in this case the entire difference lies in the constant, but adding it back by default is usually quite messy so I chose not to (although in previous versions of reghdfe there was a reported _cons coefficient).
As a solution, perhaps we could have a suboption that adds back the constant to the first set of intercepts?
Something like reghdfe R , a(i.event##c.( Var1 Var2 Var3), savefe keepconstant)
It would basically do what you are currently doing with the residuals()
option but behind the scenes...
Let me know if your current workflow works with the residuals()
workaround, and if so I'll try to add it for the next version of reghdfe (~ 1 month or so).
Ok ! Thanks!
Olivier
From: Sergio Correia [mailto:notifications@github.com] Sent: May-26-16 1:05 PM To: sergiocorreia/reghdfe reghdfe@noreply.github.com Cc: Olivier Dessaint Olivier.Dessaint@Rotman.Utoronto.Ca; Author author@noreply.github.com Subject: Re: [sergiocorreia/reghdfe] Bug? (#52)
I agree that it is confusing. All in all, in this case the entire difference lies in the constant, but adding it back by default is usually quite messy so I chose not to (although in previous versions of reghdfe there was a reported _cons coefficient).
As a solution, perhaps we could have a suboption that adds back the constant to the first set of intercepts?
Something like reghdfe R , a(i.event##c.( Var1 Var2 Var3), savefe keepconstant)
It would basically do what you are currently doing with the residuals() option but behind the scenes...
Let me know if your current workflow works with the residuals() workaround, and if so I'll try to add it for the next version of reghdfe (~ 1 month or so).
— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHubhttps://github.com/sergiocorreia/reghdfe/issues/52#issuecomment-221932671
Dear Sergio,
I have been using the two following syntaxes to estimate and save the FE coefficients
A) reghdfe r, a(i.id i.id#c.rmrf id#c.smb id#c.hml, savefe) B) reghdfe r, a(i.id##c.(rmrf smb hml),savefe)
Where id is a categorical variable, and all the other are continuous variables.
If my understanding is correct, the two syntaxes should be equivalent. I.e the FE estimates should be the same. But I find different alphas. The slope estimates are the same ( _hdfe2_slop1 using syntax A = _hdfe1_slop1 using syntax B), but not the alpha estimates ( hdfe1 using syntax A <> hdfe1 using syntax B)
Note that the correct alpha seems to be obtained with syntax A only. When I do
reghdfe r if id==1, a(i.id i.id#c.rmrf id#c.smb id#c.hml, savefe)
and
reg r rmrf smb hml if id==1
I find that hdfe1 = _cons, which is what I expected. However, when I do
reghdfe r if id==1, a(i.id##c.(rmrf smb hml),savefe)
I have hdfe1 different than _cons
Is it a bug in the command or I am missing something?
Thank you so much for your help !
Best,
OD