Open AnonTendim opened 4 months ago
@rafimikail would this be something of interest for you to pick up?
Hey @s3alfisc , Sure, i will take on this task!
I will get familiar on the codebase for did2s first
Kyle's R journal paper is a great starting point to get familiar with the method: link.
Btw, thanks for the suggestion, @AnonTendim, very much appreciated!
I will check that one out, thanks for giving the reference :D
Apparently, there is a difference between pyfixest and fixest (R) when using the predict method with the new data argument after running a weighted feols.
I'm still digging in this issue, but the reason behind it might be obvious. I'm getting different point estimates when including weights in pyfixest did2s and I figure this might be one of the reason.
It seems like feols then gives the predictions of the unweighted regression. For example:
df_castle = pd.read_stata("https://github.com/scunning1975/mixtape/raw/master/castle.dta")
df_castle = df_castle.astype({"post": bool})
df_untreated = df_castle.loc[df_castle["post"] == 0, :]
fit_untrt = pf.feols("l_homicide ~ 1 | state + year", data=df_untreated)
fit_untrt_wgt = pf.feols("l_homicide ~ 1 | state + year", data=df_untreated, weights="popwt")
yhat = fit_untrt.predict(newdata=df_castle)
yhat_wgt = fit_untrt_wgt.predict(newdata=df_castle)
# Getting first stage predictions
assert np.all(np.isclose(yhat[:5], yhat_wgt[:5]))
yhat = fit_untrt.predict()
yhat_wgt = fit_untrt_wgt.predict()
assert np.logical_not(np.all((np.isclose(yhat, yhat_wgt))))
Edit: Checked on pyfixest 0.25.3
Hi @marcandre259 , thanks for raising this!
If I understand you correctly, we would expect that with WLS, predict()
would always produce different values than OLS, unless all weights are identical. .
The behavior of the predict() method for newdata = None
is controlled here: https://github.com/py-econometrics/pyfixest/blob/fd8e1436efd14cc98071eed51b91f98624cd1b6c/pyfixest/estimation/feols_.py#L1606.
If no new data is provided, predict()
will return self._Y_hat_link
, which is computed as the difference of the untransformed and unweighted Y via $Y - \hat{u} := \hat{Y}$:
This seems correct to me.
I suppose that the error with newdata
is related to the use of fixed effects - if we drop them, we indeed end of with different results:
fit_untrt = pf.feols("l_homicide ~ 1 +C(state) + C(year)", data=df_untreated)
fit_untrt_wgt = pf.feols("l_homicide ~ 1 + C(state) + C(year)", data=df_untreated, weights="popwt")
yhat = fit_untrt.predict(newdata=df_castle)
yhat_wgt = fit_untrt_wgt.predict(newdata=df_castle)
yhat[0:5]
# array([1.96511559, 1.98852372, 1.96735659, 2.01274515, 2.00770564])
yhat_wgt [0:5]
# array([1.98343013, 2.0046152 , 1.99374426, 2.01362497, 1.98860927])
So the error must lie somewhere here https://github.com/py-econometrics/pyfixest/blob/fd8e1436efd14cc98071eed51b91f98624cd1b6c/pyfixest/estimation/feols_.py#L1614.
I will take a look at this now. Thanks for raising this!
This is a bug in WLS with newdata provided, and I have opened an issue here: #678
Similar to what is in R fixest, will you consider adding support for weights: Optional variable to run a weighted first- and second-stage regressions in did2s? Thanks!