py-econometrics / pyfixest

Fast High-Dimensional Fixed Effects Regression in Python following fixest-syntax
https://py-econometrics.github.io/pyfixest/pyfixest.html
MIT License
117 stars 27 forks source link

Wildboottest: Incorrect Initiation of R vector -> incorrect results #505

Closed s3alfisc closed 2 weeks ago

s3alfisc commented 2 weeks ago

Update

The error in rwolf() described below goes back to a pretty bad bug in wildboottest. In a nutshell, the R vector was incorrectly initiated. Unfortunately, my tests did not catch this, as I used np.assert_allclose, which does not throw errors with pytest 🤕. Sorry! A bug fix is on its way.

Example

The rwolf() method seems to produce incorrect results.

import numpy as np
import pandas as pd
import rpy2.robjects as ro
from rpy2.robjects import pandas2ri
from rpy2.robjects.packages import importr

from pyfixest.estimation.estimation import feols
from pyfixest.estimation.multcomp import _get_rwolf_pval, bonferroni, rwolf
from pyfixest.utils.utils import get_data

pandas2ri.activate()

fixest = importr("fixest")
wildrwolf = importr("wildrwolf")
stats = importr("stats")
broom = importr("broom")

fit1 = feols("Y ~ X1 | f1 + f2", data=data)
fit2 = feols("Y2 ~ X1 | f1 + f2", data=data)
fit3 = feols("Y3 ~ X1 | f1 + f2", data=data)

rwolf_py = rwolf([fit1, fit2, fit3], "X1", reps=9999, seed=12345)

# R
fit_r = fixest.feols(ro.Formula("c(Y, Y2, Y3) ~ X1 | f1 + f2"), data=data)
rwolf_r = wildrwolf.rwolf(fit_r, param="X1", B=9999, seed=12345)

The python implementation produces

rwolf_py

image

while R's outputs

pd.DataFrame(rwolf_r)

image

Check the last rows in both data frames - the first Romano Wolf pvalue is drastically different.

Likely, the issue is in the rwolf function and not in get_rwolf_pvalue(), as the ladder is directly tested against the r package and produces identical results.

Why did the tests did not catch the error? Incorrect use of the rtol and atol function arguments in np.testing.assert_allclose.