tlamadon / pytwoway

Two way models in python
MIT License
24 stars 7 forks source link

Sum of Squared Residuals from Corrected Models #42

Closed klintmane1 closed 1 year ago

klintmane1 commented 1 year ago

Is it possible to retrieve sum of squared residuals and degrees of freedom from the models with the corrections?

adamoppenheimer commented 1 year ago

Hi Klint,

Please see here for a discussion of the degrees-of-freedom interpretation (it's in the screenshot). I'm not sure what the HO dof is, so I would recommend looking at the referenced source, while for the HE case unfortunately it seems there is no dof interpretation.

You can retrieve the estimated var(eps) from the dictionary results .summary['var(eps)_fe'], .summary['var(eps)_ho'], and .summary['var(eps)_he']. I think you can convert this to the sum of squared residuals by just multiplying by the number of observations (or if the observations are weighted, by the sum of the weights), but please correct me if I'm wrong.

Please let me know if this answers your questions!

klintmane1 commented 1 year ago

Hi Adam, Thanks a lot for the quick reply. And for pointing me to the other discussion. I am also interested in getting an adjusted R^2 for the corrected model.

I think you are right about the sum of squared residuals. Do you think that would be the same way to get it, even if I have controls included in the model?

This does answer my questions. Thanks!

adamoppenheimer commented 1 year ago

Hi Klint,

I have thought about the degrees-of-freedom a bit more and I hope this is a sufficient answer:

For the HO case, if I understand correctly then you want the denominator for the sigma^2 estimation. So if you look here, you can see how I am calculating it.

In the case where you are not weighting, the degrees-of-freedom will be self.nn - self.n_cov, while in the weighted case it's a bit more complicated. You will want to define trace_approximation = np.mean(self.tr_sigma_ho_all), which is using a trace approximation, then you will need to compute dof = np.sum(1 / self.Dp) - trace_approximation (where .Dp is the weights).

Regarding computing the sum of squared residuals, you should be able to get the number of observations by either taking the length of your dataframe if you aren't using weights (e.g. len(df)) or the sum of your weights if you are using weights (e.g. df['w'].sum()). Then just multiply that by the estimated var(eps) values I mentioned in my previous post (these values should still be available even if you are using controls).

Please let me know if this answers your questions.

klintmane1 commented 1 year ago

This is great! Thanks a lot!