py-econometrics / pyfixest

Fast High-Dimensional Fixed Effects Regression in Python following fixest-syntax
https://py-econometrics.github.io/pyfixest/
MIT License
172 stars 34 forks source link

Randomization inference, "ri" sampling_method in rwolf, gives too tight a sample of null t-statistics #717

Open marcandre259 opened 2 days ago

marcandre259 commented 2 days ago

Possible issue I noticed while working on #698.

The behavior was initially noticed when comparing "wild-bootstrap" to the "ri" sample_method p-values when the parameter of interest has no association with the outcome.

With the tight null t-distribution, the resulting p-value is too small.

To reproduce:

import pyfixest as pf
import numpy as np

import matplotlib.pyplot as plt

# Get data and randomize
data = pf.get_data()

np.random.default_rng(232)
data["X1"] = np.random.choice(data["X1"], size=data.shape[0], replace=False)

fit = pf.feols("Y ~ X1", data=data)

fit.summary()

Estimation: OLS Dep. var.: Y, Fixed effects: 0 Inference: iid Observations: 998

Coefficient Estimate Std. Error t value Pr(> t ) 2.5% 97.5%
Intercept -0.160 0.119 -1.344 0.179 -0.394 0.074
X1 0.033 0.090 0.367 0.714 -0.144 0.211

RMSE: 2.304 R2: 0.0

seed = 111
df_wild, df_t_wild = fit.wildboottest(param="X1", reps=9999, return_bootstrapped_t_stats=True, seed=seed)

rng = np.random.default_rng(232)
fit.ritest(resampvar="X1", reps=9999, type="randomization-t", store_ritest_statistics=True, rng=rng)

fig, ax = plt.subplots(nrows=1, ncols=2, figsize=(12, 4))
ax[0].hist(fit._ritest_statistics, label="RI t stats", alpha=0.4);
ax[0].axvline(x=fit._ritest_sample_stat, linestyle="--", label="Observed RI t stats", color="black")
ax[0].legend()
ax[1].hist(df_t_wild, label="Wild t stats", alpha=0.4, color="orange");
ax[1].axvline(df_wild["t value"], label="Observed Wild t stat", color="black", linestyle="--");
ax[1].legend()

comparing_null_t_empirical_distributions

s3alfisc commented 2 days ago

Yes, this looks wrong! I'll take a look later. Thanks for reporting!

s3alfisc commented 1 day ago

At second thought, this might not necessarily be a bug, for two reasons:

Will have to think about this more - took a look at the code & it looked mostly fine, though will have to check again. Width of the sampling interval differences looks indeed suspicious.

marcandre259 commented 1 day ago

Hi @s3alfisc,

Based on testing the sharp hypothesis with randomization inference, I would expect the boostrap approach to be the less conservative one then <- edit: actually I'd expect the opposite, since it should be easier to reject for at least one i than the average. Nevertheless, the paper blow gives the counterintuitive result that the randomization (sharp) approach is less powerful (paradox)

I'm quickly peeking in this paper that confirms this with simulations in table 1.

As far as progress on #698, I'll get back to including RI for Westfall-Young now that this is open.