Differences in IV estimation relative to linearmodels

aeturrell commented 1 year ago

I'm working on an IV example that was previously working in linearmodels but doesn't solve in pyfixest. It's possible that I've just misunderstood the syntax though!

reprex:

In this case, the model will be

$$ \text{Price}_i = \hat{\pi_0} + \hat{\pi_1} \text{SalesTax}_i + v_i $$

in the first stage regression and

$$ \text{Packs}_i = \hat{\beta_0} + \hat{\beta_2}\widehat{\text{Price}_i} + \hat{\beta_1} \text{RealIncome}_i + u_i $$

in the second stage.

Data:

import pandas as pd
from linearmodels.iv import IV2SLS

dfiv = pd.read_csv(
    "https://vincentarelbundock.github.io/Rdatasets/csv/AER/CigarettesSW.csv",
    dtype={"state": "category", "year": "category"},
).assign(
    rprice=lambda x: x["price"] / x["cpi"],
    rincome=lambda x: x["income"] / x["population"] / x["cpi"],
)
dfiv.head()

linearmodels runs okay:

results_iv2sls = IV2SLS.from_formula(
    "np.log(packs) ~ 1 + np.log(rincome) + C(year) + C(state) + [np.log(rprice) ~ taxs]",
    df,
).fit(cov_type="clustered", clusters=df["year"])
print(results_iv2sls.summary)

                          IV-2SLS Estimation Summary                          
==============================================================================
Dep. Variable:          np.log(packs)   R-squared:                      0.9659
Estimator:                    IV-2SLS   Adj. R-squared:                 0.9279
No. Observations:                  96   F-statistic:                -1.296e+17
Date:                Thu, Oct 26 2023   P-value (F-stat)                1.0000
Time:                        09:31:50   Distribution:                 chi2(50)
Cov. Estimator:             clustered                                         

                                Parameter Estimates                                
===================================================================================
                 Parameter  Std. Err.     T-stat    P-value    Lower CI    Upper CI
-----------------------------------------------------------------------------------
Intercept           9.4924     0.0263     360.24     0.0000      9.4407      9.5440
np.log(rincome)     0.4434                                                         
C(year)[T.1995]    -0.0328                                                         
C(state)[T.AR]      0.1770     0.0531     3.3338     0.0009      0.0729      0.2810
C(state)[T.AZ]     -0.0899     0.0132    -6.8132     0.0000     -0.1158     -0.0640
C(state)[T.CA]     -0.2781     0.0214    -12.996     0.0000     -0.3200     -0.2361
C(state)[T.CO]     -0.2479     0.0090    -27.625     0.0000     -0.2655     -0.2303
C(state)[T.CT]     -0.0171     0.0196    -0.8720     0.3832     -0.0556      0.0213
C(state)[T.DE]      0.1110     0.0291     3.8105     0.0001      0.0539      0.1682
C(state)[T.FL]      0.0762     0.0142     5.3596     0.0000      0.0483      0.1041
C(state)[T.GA]     -0.0695     0.0251    -2.7706     0.0056     -0.1186     -0.0203
C(state)[T.IA]      0.0120     0.0739     0.1629     0.8706     -0.1328      0.1569
C(state)[T.ID]     -0.1272     0.0077    -16.597     0.0000     -0.1423     -0.1122
C(state)[T.IL]     -0.0339     0.0081    -4.1912     0.0000     -0.0497     -0.0180
C(state)[T.IN]      0.1198     0.0611     1.9609     0.0499   5.573e-05      0.2395
C(state)[T.KS]     -0.0910     0.0305    -2.9884     0.0028     -0.1507     -0.0313
C(state)[T.KY]      0.3525     0.0631     5.5906     0.0000      0.2289      0.4761
C(state)[T.LA]      0.1315     0.0104     12.664     0.0000      0.1112      0.1519
C(state)[T.MA]     -0.0403     0.0069    -5.8826     0.0000     -0.0538     -0.0269
C(state)[T.MD]     -0.2322     0.0239    -9.7376     0.0000     -0.2790     -0.1855
C(state)[T.ME]      0.2008     0.0574     3.5011     0.0005      0.0884      0.3133
C(state)[T.MI]      0.1268     0.0745     1.7009     0.0890     -0.0193      0.2728
C(state)[T.MN]      0.0568     0.0490     1.1595     0.2463     -0.0392      0.1529
C(state)[T.MO]      0.0640     0.0476     1.3454     0.1785     -0.0292      0.1572
C(state)[T.MS]      0.1501     0.0272     5.5267     0.0000      0.0969      0.2034
C(state)[T.MT]     -0.1522     0.0054    -28.250     0.0000     -0.1627     -0.1416
C(state)[T.NC]      0.0396     0.0191     2.0655     0.0389      0.0020      0.0771
C(state)[T.ND]     -0.0311     0.0399    -0.7787     0.4361     -0.1092      0.0471
C(state)[T.NE]     -0.0741     0.0375    -1.9765     0.0481     -0.1476     -0.0006
C(state)[T.NH]      0.3504     0.0315     11.114     0.0000      0.2886      0.4122
C(state)[T.NJ]     -0.0873  6.107e-05    -1429.3     0.0000     -0.0874     -0.0872
C(state)[T.NM]     -0.2858     0.0040    -71.049     0.0000     -0.2937     -0.2779
C(state)[T.NV]      0.1789     0.0259     6.9075     0.0000      0.1281      0.2296
C(state)[T.NY]     -0.0719     0.0032    -22.256     0.0000     -0.0782     -0.0655
C(state)[T.OH]      0.0325     0.0402     0.8088     0.4186     -0.0463      0.1114
C(state)[T.OK]      0.0946     0.0538     1.7572     0.0789     -0.0109      0.2000
C(state)[T.OR]     -0.0153     0.0673    -0.2269     0.8205     -0.1471      0.1166
C(state)[T.PA]     -0.0031     0.0006    -4.8401     0.0000     -0.0044     -0.0019
C(state)[T.RI]      0.1394     0.0921     1.5136     0.1301     -0.0411      0.3200
C(state)[T.SC]     -0.0212     0.0334    -0.6345     0.5257     -0.0866      0.0442
C(state)[T.SD]     -0.0675     0.0711    -0.9488     0.3427     -0.2069      0.0719
C(state)[T.TN]      0.1473     0.0470     3.1340     0.0017      0.0552      0.2394
C(state)[T.TX]     -0.0579     0.0136    -4.2560     0.0000     -0.0845     -0.0312
C(state)[T.UT]     -0.4899     0.0276    -17.776     0.0000     -0.5440     -0.4359
C(state)[T.VA]     -0.0559     0.0471    -1.1875     0.2350     -0.1482      0.0364
C(state)[T.VT]      0.2209     0.0467     4.7267     0.0000      0.1293      0.3125
C(state)[T.WA]      0.0064     0.0011     6.0151     0.0000      0.0043      0.0085
C(state)[T.WI]      0.0741     0.0590     1.2569     0.2088     -0.0415      0.1897
C(state)[T.WV]      0.1576     0.0582     2.7097     0.0067      0.0436      0.2716
C(state)[T.WY]     -0.0169     0.0590    -0.2858     0.7750     -0.1325      0.0988
np.log(rprice)     -1.2793                                                         
===================================================================================

Endogenous: np.log(rprice)
Instruments: taxs
Clustered Covariance (One-Way)
Debiased: False
Num Clusters: 2

pyfixest produces an "UnderDeterminedIVError". Code:

results_iv = feols("np.log(packs) ~ 1 + np.log(rincome) | C(year) + C(state) | np.log(rprice) ~ taxs ", data=dfiv, vcov={"CRV1": "year"})
results_iv.summary()

---------------------------------------------------------------------------
UnderDeterminedIVError                    Traceback (most recent call last)
[/Users/aet/Documents/git_projects/coding-for-economists/econmt-regression.ipynb](https://file+.vscode-resource.vscode-cdn.net/Users/aet/Documents/git_projects/coding-for-economists/econmt-regression.ipynb) Cell 67 line 1
----> [1](vscode-notebook-cell:/Users/aet/Documents/git_projects/coding-for-economists/econmt-regression.ipynb#Y340sZmlsZQ%3D%3D?line=0) results_iv = feols("np.log(packs) ~ 1 + np.log(rincome) + C(year) + C(state) | np.log(rprice) ~ taxs ", data=dfiv, vcov={"CRV1": "year"})
      [2](vscode-notebook-cell:/Users/aet/Documents/git_projects/coding-for-economists/econmt-regression.ipynb#Y340sZmlsZQ%3D%3D?line=1) results_iv.summary()

File [~/mambaforge/envs/codeforecon/lib/python3.10/site-packages/pyfixest/estimation.py:129](https://file+.vscode-resource.vscode-cdn.net/Users/aet/Documents/git_projects/coding-for-economists/~/mambaforge/envs/codeforecon/lib/python3.10/site-packages/pyfixest/estimation.py:129), in feols(fml, data, vcov, ssc, fixef_rm, collin_tol)
    126 _estimation_input_checks(fml, data, vcov, ssc, fixef_rm, collin_tol)
    128 fixest = FixestMulti(data=data)
--> 129 fixest._prepare_estimation("feols", fml, vcov, ssc, fixef_rm)
    131 # demean all models: based on fixed effects x split x missing value combinations
    132 fixest._estimate_all_models(vcov, fixest._fixef_keys, collin_tol=collin_tol)

File [~/mambaforge/envs/codeforecon/lib/python3.10/site-packages/pyfixest/FixestMulti.py:85](https://file+.vscode-resource.vscode-cdn.net/Users/aet/Documents/git_projects/coding-for-economists/~/mambaforge/envs/codeforecon/lib/python3.10/site-packages/pyfixest/FixestMulti.py:85), in FixestMulti._prepare_estimation(self, estimation, fml, vcov, ssc, fixef_rm)
     82 self._fixef_keys = None
     83 self._is_multiple_estimation = None
---> 85 fxst_fml = FixestFormulaParser(fml)
     86 fxst_fml.get_fml_dict()  # fxst_fml._fml_dict might look like this: {'0': {'Y': ['Y~X1'], 'Y2': ['Y2~X1']}}. Hence {FE: {DEPVAR: [FMLS]}}
     87 if fxst_fml._is_iv:

File [~/mambaforge/envs/codeforecon/lib/python3.10/site-packages/pyfixest/FormulaParser.py:100](https://file+.vscode-resource.vscode-cdn.net/Users/aet/Documents/git_projects/coding-for-economists/~/mambaforge/envs/codeforecon/lib/python3.10/site-packages/pyfixest/FormulaParser.py:100), in FixestFormulaParser.__init__(self, fml)
     98 if endogvars is not None:
     99     if len(endogvars) > len(instruments):
--> 100         raise UnderDeterminedIVError(
    101             "The IV system is underdetermined. Only fully determined systems are allowed. Please provide as many instruments as endogenous variables."
    102         )
    103     else:
    104         pass

UnderDeterminedIVError: The IV system is underdetermined. Only fully determined systems are allowed.

Grateful for any pointers!