Closed aeturrell closed 1 year ago
This is of course a bug and likely related to having the plus 1 in the first part of the formula, if you drop that, things should run through (I believe). I'll try to fix it asap =) btw I am really looking forward to all your feedback, there's likely a lot of edge cases that I have not thought about testing - just like this one.
Thanks Alex. Had a go at dropping the plus 1 (as below) but still got the same error in this case.
results_iv = feols("np.log(packs) ~ np.log(rincome) | C(year) + C(state) | np.log(rprice) ~ taxs ", data=dfiv, vcov={"CRV1": "year"})
results_iv.summary()
Just took a look - it's a minor bug - in your example, the code accidentally compares the length of two strings vs two lists, which then triggers the error you find. I'll release a new version in a bit, as I found another small bug with the summary() function for IV objects.
Btw, it's currently not supported to wrap the fixed effects inside C(.)
as ... | C(year) + C(state) | ...
as I process the fixed effects outside of formulaic
. The error you'd get is "KeyError: "None of [Index(['C(year)', 'C(state)'], dtype='object')] are in the [columns]"". Something I should allow for convenience.
It's fixed now:
results_iv = feols("np.log(packs) ~ np.log(rincome) |year + state| np.log(rprice) ~ taxs ", data=dfiv, vcov={"CRV1": "year"})
results_iv.summary()
# ###
#
# Estimation: IV
# Dep. var.: np.log(packs), Fixed effects: year+state
# Inference: CRV1
# Observations: 96
# | Coefficient | Estimate | Std. Error | t value | Pr(>|t|) | 2.5 % | 97.5 % |
# |:----------------|-----------:|-------------:|-----------------------:|-----------:|--------:|---------:|
# | np.log(rprice) | -1.279 | 0.000 | -38033610883142432.000 | 0.000 | -1.279 | -1.279 |
# | np.log(rincome) | 0.443 | 0.000 | 4193485804221483.000 | 0.000 | 0.443 | 0.443 |
Point estimates are identical to linearmodels
and CIs are off (linearmodels
does not even report them, maybe I should do so as well if t-stats get too large?).
I will merge #198 after the CI tests pass!
It's merged into master. I'll release the new version to pypi later today =)
And release to PyPi!
Crikey, amazing! Thanks so much for fixing so quickly. So, on the C(...)
syntax, my default has always been to use it (explicit is better than implicit and all that). But having had a quick look at patsy, R formulas, and formulaic, I'm not sure how standard it is? My instinct would be if it's standard, have it in there, but if it's not, maybe I should change what's written in Coding for Economists to reflect this (ie remove C(...)
notation), and put more emphasis on ensuring the data type going in to regressions as a fixed effect is categorical.
Re: t-tests, there's probably some value in switching to scientific notation at some point, but think this example is an edge case and it's not a big issue!
I'm working on an IV example that was previously working in linearmodels but doesn't solve in pyfixest. It's possible that I've just misunderstood the syntax though!
reprex:
In this case, the model will be
$$ \text{Price}_i = \hat{\pi_0} + \hat{\pi_1} \text{SalesTax}_i + v_i $$
in the first stage regression and
$$ \text{Packs}_i = \hat{\beta_0} + \hat{\beta_2}\widehat{\text{Price}_i} + \hat{\beta_1} \text{RealIncome}_i + u_i $$
in the second stage.
Data:
linearmodels runs okay:
pyfixest produces an "UnderDeterminedIVError". Code:
Grateful for any pointers!