Closed joachim-gassen closed 3 years ago
Hi Joachim,
Thanks a lot for your PR! Also, your blog post was extremely useful in understanding the problem.
Why it does is beyond me, given that this constant cannot be interpreted in a meaningful way without diving into the internals of the fixed effect structure.
For several years I resisted adding _cons
as I thought it was a completely useless piece of information, and just complicated my code. But the margins
command didn't work without it, so eventually I added it. According to readme.md:
- version 5.0 29jun2018:
- Added support for margins postestimation command.
- Added _cons row to output table, so the intercept is reported (as in regress/xtreg/areg). The noconstant option disables this, but doing so might make the output of margin incorrect.
As I mention in the help file, this should be a purely aesthetic option, but it turns out it might not be under multi-way clustering if the variance matrices end up not being positive-semidefinite (which is more likely in small sample sizes).
Assuming that my hunch is correct: I do not see why reporting a constant for fixed effect models has become the default, in particular as it is having this side effect. In any case it would have helped me to see a warning or something similar in the documentation.
Agree. To fix this, I removed the side effect and now it shouldn't matter to the results if you run with noconstant
or not.
In particular, the mata reghdfe_fix_psd()
function will only work on the submatrix that excludes _cons
. This means that the entire matrix might not be positive-semidefinite due to the _cons
row/col. However, we shouldn't care about this, as this should only matter if you are running a test that involves a regressor together with cons. For instance:
reghdfe price weight length
test weight = length // this is correct
test weight = _cons // this might be problematic
One last thing: I'm in the middle of a major update to reghdfe, so the fix was pushed into the reghdfe6
branch instead of the master
branch. Could you check it out to see if things now work on your side? To install you need to uninstall+reinstall ftools and reghdfe as follows:
* Install ftools (remove program if it existed previously)
cap ado uninstall ftools
net install ftools, from("https://raw.githubusercontent.com/sergiocorreia/ftools/groupreg/src/")
* Install reghdfe 6.x
cap ado uninstall reghdfe
net install reghdfe, from("https://raw.githubusercontent.com/sergiocorreia/reghdfe/reghdfe6/src/")
Many thanks again! Sergio
Also, as reference, this statalist thread goes into a bit more detail into why margins
needs _b[_cons]:
Hi Sergio:
Unbelievable! Thank you for coming back to me so quickly. I just verified that your fix is perfectly addressing the problem. See:
These are differences between the R fixest package and reghdfe for the new version version 6.0.5 01Mar2021. All within numerical precision range. Super! Thanks for pointing out that the margins command needs the constant to be there. This makes sense.
One minor thing (not sure whether it matters): The table that Stata returns after running reghdfe now does not report a standard error for the constant anymore. This makes sense to me as the standard error for the constant term cannot be calculated without running into the problems outlined above but it might confuse people. Is the SE needed for the margin command or does this simply operate on _b[]? Also, I tested test x = _cons
and this returns an F-test although I am not entirely sure how it can do that given that there is no V.
Thank you again for responding so quickly. Let me know when there is something else I can do to help.
Cheers!
Joachim
Hi Joachim,
Glad it works on your side!
These are differences between the R fixest package and reghdfe for the new version version 6.0.5 01Mar2021. All within numerical precision range. Super! Thanks for pointing out that the margins command needs the constant to be there. This makes sense.
I wonder if these differences would go away if you increase the tolerance, with e.g. "tol(1e-14)" instead of the default of 1e-8 (and same for fixest). Otherwise, it might help to pinpoint other discrepancies
One minor thing (not sure whether it matters): The table that Stata returns after running reghdfe now does not report a standard error for the constant anymore. This makes sense to me as the standard error for the constant term cannot be calculated without running into the problems outlined above but it might confuse people. Is the SE needed for the margin command or does this simply operate on _b[]? Also, I tested test x = _cons and this returns an F-test although I am not entirely sure how it can do that given that there is no V.
Great catch. I just added an update that does the update in two steps:
V = fix_psd(V)
)V[1..K, 1..K] = fix_psd(backup_V[1..K, 1..K])
)This means that the results "where it matters" (regressors other than _cons) are the same, but we are still able to see SEs for _cons. This seems to work and not confuse readers, but I'm still a bit on the fence on how "correct" this approach is
Hi Sergio:
Thank you and apologies for the delay in coming back to you. I checked the results for tol(2.220446e-12) which is the minimum amount that the fixed.tol
parameter allows for my machine. The results look very similar (differences within +-5e-9 with some slightly larger values for small samples). For me this seems consistent with R using 64bit floats and the handshake between R and Stata causing additional noise. In other words: Good enough for me ;-)
For what it is worth, I like the approach that you took. While it generates two standard errors for one model that depend on different Vs, it avoids the unnecessary psd fix for the main coefficients and makes margin()
happy. Of course an alternative would be not to report a constant by default and instead to require a constant
option for situations wehere users want to use margins
with a consistent V but this is a mere design choice.
Thank you again for providing feedback so quickly and for maintaining this super important package so well!
Joachim
No problem at all, and Im very happy that the program is being validated externally, and that so many options now exist in R/Py/Julia.
Hi there:
First of all: Thank you for this amazing package! As you are pretty well aware it has become the workhorse for panel model estimation for many people. This is also why I have been working on a blog post about how its standard errors can be reconciled with some of the prominent R packages in this field (fixest, lfe, sandwich, plm).
Doing so I noticed that the twoway clustered standard errors of reghdfe for fixed effect panel models differ at times from the ones of fixest and lfe, even if samples are completely balanced and the same degree of freedom adjustments are applied. Most of the time (like in the example below) the differences are very minor but they can also become sizable on small samples (I encountered more than 10 %).
Here is my guess why it happens: You are by default estimating a constant intercept with standard error even when fixed effects are present in the model. At times the resulting VCE becomes non-positive semi-definite, causing the application of the Cameron, Gelbach & Miller (2011, p. 241 f.) fix while it would not been necessary to apply this fix if all the fixed effects would have been projected out. This causes changes also in the SE of the independent variable of interest. When I estimate your model with the undocumented
noconstant
option, then the SE are identical with the ones of {fixest} and {lfe} for balanced panels and regular fixed effects (with the range of numerical precision). Am I making sense here? Thank you for pointing out anything that I might have missed.Assuming that my hunch is correct: I do not see why reporting a constant for fixed effect models has become the default, in particular as it is having this side effect. In any case it would have helped me to see a warning or something similar in the documentation.
See below for an example based on the Petersen data set. You can see a somewhat more complete discussion in my blog post in case you are interested.
Thank you again for your great work!
Joachim