Closed amichuda closed 3 weeks ago
Hi @amichuda, this is indeed a little bit confusing, but I think this is correct, i.e. not a bug 😅
In both implementations, we populate beta_jack
with leave-one-cluster out regression estimates.
In the second code snipped, we loop over feols()
, leaving one cluster out of each estimation - we run a classical jackknife.
The first implementation goes back to work by MacKinnon, Nielsen et al and is optimized, but only works for OLS. Basically, they use the fact that you can estimate regressions "in batches".
You can compute the leave-one-cluster out regression estimator as
$$ \hat{\beta{-g}} = ( X'X - X{g}'X{g} )^{-1} (X'Y - X{g} Y_{g}) $$
which goes simply back to the fact that
$$ \hat{\beta} = \sum{g=1}^{G} (X{g}' X{g})^{-1} X{g}' Y_{g} $$
This has the advantage that you have to compute X'X and X'Y only once, and X{g}'X{g} is faster to compute than X{-g}'X{-g} as the data set is smaller.
See equ (9) in MNW: link
Ah okay perfect, thanks for the clarification. Will close!
Looking at the calculation of LOO "clusterjacks" in
feols
, I see this code:https://github.com/py-econometrics/pyfixest/blob/37bfe0f95268983ee12b13459626826cce0a808e/pyfixest/estimation/feols_.py#L573-L582
which
X_g
andY_g
are filtered to where theyg==cluster_col
, whereas in the associated loop for all other models, it is:https://github.com/py-econometrics/pyfixest/blob/37bfe0f95268983ee12b13459626826cce0a808e/pyfixest/estimation/feols_.py#L592-L602
so you are filtering OUT each cluster. Is this by design, or a bug?