py-econometrics / pyfixest

Fast High-Dimensional Fixed Effects Regression in Python following fixest-syntax
https://py-econometrics.github.io/pyfixest/
MIT License
179 stars 35 forks source link

Support for weights as an optional parameter for did2s? #557

Open AnonTendim opened 4 months ago

AnonTendim commented 4 months ago

Similar to what is in R fixest, will you consider adding support for weights: Optional variable to run a weighted first- and second-stage regressions in did2s? Thanks!

s3alfisc commented 4 months ago

@rafimikail would this be something of interest for you to pick up?

rafimikail commented 4 months ago

Hey @s3alfisc , Sure, i will take on this task!

I will get familiar on the codebase for did2s first

s3alfisc commented 4 months ago

Kyle's R journal paper is a great starting point to get familiar with the method: link.

Btw, thanks for the suggestion, @AnonTendim, very much appreciated!

rafimikail commented 4 months ago

I will check that one out, thanks for giving the reference :D

marcandre259 commented 1 month ago

Apparently, there is a difference between pyfixest and fixest (R) when using the predict method with the new data argument after running a weighted feols.

I'm still digging in this issue, but the reason behind it might be obvious. I'm getting different point estimates when including weights in pyfixest did2s and I figure this might be one of the reason.

It seems like feols then gives the predictions of the unweighted regression. For example:

df_castle = pd.read_stata("https://github.com/scunning1975/mixtape/raw/master/castle.dta")
df_castle = df_castle.astype({"post": bool})

df_untreated = df_castle.loc[df_castle["post"] == 0, :]

fit_untrt = pf.feols("l_homicide ~ 1 | state + year", data=df_untreated)
fit_untrt_wgt = pf.feols("l_homicide ~ 1 | state + year", data=df_untreated, weights="popwt")

yhat = fit_untrt.predict(newdata=df_castle)
yhat_wgt = fit_untrt_wgt.predict(newdata=df_castle)

# Getting first stage predictions
assert np.all(np.isclose(yhat[:5], yhat_wgt[:5]))

yhat = fit_untrt.predict()
yhat_wgt = fit_untrt_wgt.predict()

assert np.logical_not(np.all((np.isclose(yhat, yhat_wgt))))

Edit: Checked on pyfixest 0.25.3

s3alfisc commented 1 month ago

Hi @marcandre259 , thanks for raising this!

If I understand you correctly, we would expect that with WLS, predict() would always produce different values than OLS, unless all weights are identical. .

The behavior of the predict() method for newdata = None is controlled here: https://github.com/py-econometrics/pyfixest/blob/fd8e1436efd14cc98071eed51b91f98624cd1b6c/pyfixest/estimation/feols_.py#L1606.

If no new data is provided, predict() will return self._Y_hat_link, which is computed as the difference of the untransformed and unweighted Y via $Y - \hat{u} := \hat{Y}$:

https://github.com/py-econometrics/pyfixest/blob/fd8e1436efd14cc98071eed51b91f98624cd1b6c/pyfixest/estimation/feols_.py#L457

This seems correct to me.

I suppose that the error with newdata is related to the use of fixed effects - if we drop them, we indeed end of with different results:

fit_untrt = pf.feols("l_homicide ~ 1 +C(state) + C(year)", data=df_untreated)
fit_untrt_wgt = pf.feols("l_homicide ~ 1 + C(state) + C(year)", data=df_untreated, weights="popwt")

yhat = fit_untrt.predict(newdata=df_castle)
yhat_wgt = fit_untrt_wgt.predict(newdata=df_castle)

yhat[0:5]
# array([1.96511559, 1.98852372, 1.96735659, 2.01274515, 2.00770564])
yhat_wgt [0:5]
# array([1.98343013, 2.0046152 , 1.99374426, 2.01362497, 1.98860927])

So the error must lie somewhere here https://github.com/py-econometrics/pyfixest/blob/fd8e1436efd14cc98071eed51b91f98624cd1b6c/pyfixest/estimation/feols_.py#L1614.

I will take a look at this now. Thanks for raising this!

s3alfisc commented 1 month ago

This is a bug in WLS with newdata provided, and I have opened an issue here: #678