py-econometrics / pyfixest

Fast High-Dimensional Fixed Effects Regression in Python following fixest-syntax
https://py-econometrics.github.io/pyfixest/
MIT License
152 stars 28 forks source link

Feiv: Implement Method for Anderson-Rubin Test #382

Open s3alfisc opened 5 months ago

s3alfisc commented 5 months ago

Context

After moving the IV Estimator to a "proper" 2 Stage estimation procedure (https://github.com/s3alfisc/pyfixest/issues/381), and after finalizing the Wald test implementation (https://github.com/s3alfisc/pyfixest/issues/280), it would be great to report first-stage F-statistics for IV estimation.

For other methods to test for weak instruments, see section 2.2 in @apoorvalal's paper on IVs. @apoorvalal, maybe I can make use of your expertise on this topic - which method for "IV testing" is the one you'd consider to be most relevant, and which is the one you'd like to see used more often? =)

To do

apoorvalal commented 5 months ago

For a single instrument and single endogenous variable, I think Anderson Rubin is a good choice. I have an implementation here, which amounts to picking a fine grid of candidate effect sizes, and creating a synthetic outcome Y - t * D and regressing that on the instrument (and covariates if present), and inverting this test to construct a confidence interval.

# easiest; probably best to use multiprocessing or other parallelization techniques used for wildboottest?
from joblib import Parallel, delayed 
import statsmodels.api as sm 

# wrap the following in an ARCI method for feols that checks for just-identified IV 
CIrange = np.arange(-0.1, 0.4, 0.001)
def pvfun(t):
    Y_t = Y - t * D
    mod = sm.OLS(Y_t, np.c_[sm.add_constant(Z), X]).fit(cov_type="HC2")
    Tstat = mod.params[1] / mod.bse[1]
    return (1 - sp.stats.norm.cdf(np.abs(Tstat))) * 2

Pvalue = Parallel(n_jobs=-1)(delayed(pvfun)(t) for t in CIrange)
Pvalue = np.stack(Pvalue)
lb, ub = CIrange[Pvalue >= 0.05].min(), CIrange[Pvalue >= 0.05].max()

# wrap this in a `arci_plot()` method? 
f, ax = plt.subplots(figsize=(6, 5))
ax.plot(CIrange, Pvalue)
ax.hlines(0.05, -0.1, 0.4, color="k", linestyle="--")
ax.vlines([lb, iv_est2, ub], 0, 1, color="grey", linestyle="--")
ax.set_title("Fieller-Anderson-Rubin Interval \n Card's IV data")
plt.show()
apoorvalal commented 5 months ago

I can take a crack at it later this week and send a PR; feel free to use this if you get to it sooner.

s3alfisc commented 5 months ago

That would be super cool! If you don't have too much time, you could just start with a very "basic" PR and then I could pick it up & implement unit tests etc =) Chances are I won't work on it before the weekend (still 100 other small things to do), but you never know, I'm usually irrationally excited about computing CIs by inverting tests😄

Jayhyung commented 3 months ago

I will take this issue! Will PR this one soon.