py-econometrics / pyfixest

Fast High-Dimensional Fixed Effects Regression in Python following fixest-syntax
https://py-econometrics.github.io/pyfixest/pyfixest.html
MIT License
118 stars 27 forks source link

CRV3 jackknife with feOLS class #476

Closed amichuda closed 3 weeks ago

amichuda commented 3 weeks ago

Looking at the calculation of LOO "clusterjacks" in feols, I see this code:

https://github.com/py-econometrics/pyfixest/blob/37bfe0f95268983ee12b13459626826cce0a808e/pyfixest/estimation/feols_.py#L573-L582

which X_g and Y_g are filtered to where they g==cluster_col, whereas in the associated loop for all other models, it is:

https://github.com/py-econometrics/pyfixest/blob/37bfe0f95268983ee12b13459626826cce0a808e/pyfixest/estimation/feols_.py#L592-L602

so you are filtering OUT each cluster. Is this by design, or a bug?

s3alfisc commented 3 weeks ago

Hi @amichuda, this is indeed a little bit confusing, but I think this is correct, i.e. not a bug 😅

In both implementations, we populate beta_jack with leave-one-cluster out regression estimates.

In the second code snipped, we loop over feols(), leaving one cluster out of each estimation - we run a classical jackknife.

The first implementation goes back to work by MacKinnon, Nielsen et al and is optimized, but only works for OLS. Basically, they use the fact that you can estimate regressions "in batches".

You can compute the leave-one-cluster out regression estimator as

$$ \hat{\beta{-g}} = ( X'X - X{g}'X{g} )^{-1} (X'Y - X{g} Y_{g}) $$

which goes simply back to the fact that

$$ \hat{\beta} = \sum{g=1}^{G} (X{g}' X{g})^{-1} X{g}' Y_{g} $$

This has the advantage that you have to compute X'X and X'Y only once, and X{g}'X{g} is faster to compute than X{-g}'X{-g} as the data set is smaller.

See equ (9) in MNW: link image

amichuda commented 3 weeks ago

Ah okay perfect, thanks for the clarification. Will close!