lrberge / fixest

Fixed-effects estimations
https://lrberge.github.io/fixest/
377 stars 59 forks source link

Limite Mobility Bias Correction Implementation #446

Open hsantanna88 opened 11 months ago

hsantanna88 commented 11 months ago

I am recently interest in variance decomposition and wage gaps, but I am using a large dataset with a large number of firms and individuals as fixed effects. There is the problem of their covariance and variances being severely biased due to limited mobility of workers. I was wondering if this could be implemented in fixest. It is by far the fastest package to solve fixed effect models, however, lfe has this implementation proposed by Gaure (2014).

Any plans to include that?

Thank you

kylebutts commented 11 months ago

@hsantanna88 Have you seen this paper? It also proposes a bias correction method, https://eml.berkeley.edu/~pkline/papers/KSS2020.pdf.

With PR #418, the hatvalue function produce, $P_{ii}$, which would let do a bias correction a la the reference above. The other thing you need is to identify the "largest connected component" and estimate the feols with that subsample. You can do that with lfe::compfactor, e.g.:

## create two factors
f1 <- factor(sample(1:300, 400, replace = TRUE))
f2 <- factor(sample(1:300, 400, replace = TRUE))
fr <- data.frame(f1, f2)

## find the components
cf <- lfe::compfactor(list(f1 = f1, f2 = f2))

## Largest connected component
head(fr[cf == 1, ])
#>    f1  f2
#> 2 132 167
#> 3 167 142
#> 4 299 269
#> 6 162 106
#> 8 221  23
#> 9 288   7

Created on 2023-10-28 with reprex v2.0.2

Also, the rademacher procedure in equation (17) of Gaure (2014) can be implemented with exact = FALSE.

hsantanna88 commented 11 months ago

hey @kylebutts, I came across this paper: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4322300

It is a straightforward and elegant approach, and perhaps computationally less expensive. I made a small package that uses this approach leveraging the fixest speed in hsantanna88/lmbias. It is still slow, but there is room for improvement. What is cool is that it seems to converge pretty fast to the desired values.