Closed brendaprallon closed 4 months ago
To eliminate a potential cause, using the same parameters as reghdfe
doesn't change the number of observations:
reg <- feols(
lnhourlyc ~ merit | estid^jobn + year^jobn, data = data, fixef.rm = "singleton",
fixef.tol = 1e-08,
fixef.iter = 16000
)
reg$nobs
[1] 838364
Hi Branda: no need to bang your head any further, the algorithms are simply different.
My algo is less sophisticated than the one of reghdfe
, it simply applies one pass on the data.
To be clear, imagine the following data: | obs | fe1 | fe2 |
---|---|---|---|
1 | 1 | 1 | |
2 | 2 | 1 | |
3 | 2 | 2 | |
4 | 3 | 2 | |
5 | 3 | 2 | |
6 | 3 | 2 |
Here the first FE has one singleton (first obs.), while the second FE has no singleton. My algo does just one pass and removes the first observation, leading to: | obs | fe1 | fe2 |
---|---|---|---|
2 | 2 | 1 | |
3 | 2 | 2 | |
4 | 3 | 2 | |
5 | 3 | 2 | |
6 | 3 | 2 |
And I stop there. Note that following the removal of the first observation, now the second FE has a singleton (second obs). reghdfe's algorithm recursively removes singletons until there is no one left.
So it applies a second pass, leading to: | obs | fe1 | fe2 |
---|---|---|---|
3 | 2 | 2 | |
4 | 3 | 2 | |
5 | 3 | 2 | |
6 | 3 | 2 |
And finally a third pass, ending with: | obs | fe1 | fe2 |
---|---|---|---|
4 | 3 | 2 | |
5 | 3 | 2 | |
6 | 3 | 2 |
The algorithms are different. I will make it clear in the docs that the algorithm does not apply recursion -- otherwise it creates confusion. In the future I will implement the recursive algorithm, but currently the implementation is really low level (and fast) and updating it is non trivial.
Thanks for the effort in creating the issue, that's really appreciated!
(And thanks for the words :-))
For full disclosure, I have planned to completely overhaul the code used to prepare the fixed-effects since I have a better algorithm now. At that time I'll rewrite the singleton removal. But it's a lot of c++ work, so it won't be soon sorry.
Currently, to replicate reghdfe, unfortunately you have to remove the singletons "by hand" before the estimation. Sorry.
@lrberge thank you so much for your very didactic answer! Makes perfect sense. I will make a note of that and still keep using fixest
because I need to eventually bootstrap the code and the small differences don’t justify the computational time that would take on Stata. Which just makes the point again: awesome package :)
Hi @lrberge, not sure if helpful or not, but @styfenschaer has implemented the iterative singleton procedure for pyfixest (in numba) and it's fairly fast =)
@s3alfisc: thanks! Actually writing one out from scratch is easy, the problem is that I do many things at once (not just the singletons), and the tricky thing is fitting everything together.
Hello! I have been banging my head against a wall with this for the past couple of days and can't figure out what is going on, so here I am. My apologies in advance because my reproducible example is not minimal; I really don't know what is driving this discrepancy. @lrberge I will DM you the data; for anyone else, it can be downloaded here.
I have been trying to replicate this paper in R using
fixest
. The authors use Stata'sreghdfe
. I usefixef.rm = TRUE
to drop singletons, asreghdfe
does by default. However,reghdfe
reports deleting a higher number of singleton observations thanfeols
.Here is my code in R:
Here is the code in Stata:
As you can see, the difference is just of 13 observations, but it is there. Some final information:
Stata:
Thank you very much, both for the attention to the annoying issue and for the fantastic package!