statsmodels / statsmodels

Statsmodels: statistical modeling and econometrics in Python
http://www.statsmodels.org/devel/
BSD 3-Clause "New" or "Revised" License
10.11k stars 2.88k forks source link

hat matrix, HC2, HC3 for WLS versus Stata #1209

Open josef-pkt opened 10 years ago

josef-pkt commented 10 years ago

It's not clear what definition of the hatmatrix Stata uses with aweights

testing robust HCx for WLS against Stata regress with aweights

HC1 matches after trivial change (resid -> wresid) HC2 and HC3 differ, because the diagonal of the hat matrix is different, after regress with aweights (Stata at least divides one more time by weights.)

making the trivial change resid -> wresid, exog -> wexog, I get the same results with WLS or OLS after rescaling/weighting

>>> resw = WLS(dtapa_endog, add_constant(dtapa_exog[['value', 'capital']], prepend=False), weights=1./dtapa_exog['value']).fit()
>>> resw_h2 = resw.get_robustcov_results(cov_type='HC2', use_t=True)
>>> weights_sqrt=np.sqrt(1./np.asarray(dtapa_exog['value']))
>>> resww.params
array([ 0.11694307,  0.10410757, -9.27233358])
>>> resww.bse
array([ 0.00555499,  0.01306514,  3.38487221])
>>> resww_h2 = resww.get_robustcov_results(cov_type='HC2', use_t=True)
>>> resww_h2.bse
array([ 0.00770722,  0.00995744,  2.36858154])
>>> resw_h2.bse
array([ 0.00770722,  0.00995744,  2.36858154])

however Stata gets

. regress invest mvalue kstock [aw=1/mvalue], vce(hc2)
(sum of wgt is   7.3221e-01)

Linear regression                                      Number of obs =     200
                                                       F(  2,   197) =   36.35
                                                       Prob > F      =  0.0000
                                                       R-squared     =  0.7728
                                                       Root MSE      =  35.178

------------------------------------------------------------------------------
             |             Robust HC2
      invest |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      mvalue |   .1169431   .0138133     8.47   0.000     .0897021    .1441841
      kstock |   .1041076   .0210187     4.95   0.000     .0626569    .1455582
       _cons |  -9.272334   6.059023    -1.53   0.128    -21.22121    2.676539
------------------------------------------------------------------------------

I like our results better. Note Stata has a big increase in standard errors between weighted and unweighted for the cases of HC2 and HC3, but not for HC1 which doesn't depend on hat matrix.

There will be other cases where we can test against Stata for specific models with non-spherical variance, Panel, HetGLS, GLSAR, ...

jseabold commented 10 years ago

Is there something to do here?

josef-pkt commented 10 years ago

not for 0.6. This is mainly a reminder and reference, until I figure this out.

The definition of hatmatrix across models is an open issue. (There are two definitions when we have weights or GLS whitening, but I don't know when we are supposed to use which.) For the WLS/GLS sandwiches I need other test cases, to check what FGLS and similar are supposed to be doing.

corrections like HC2, HC3 need to be extended to other sandwiches.

josef-pkt commented 4 years ago

interesting: blog post about Stata's hatmatrix in WLS linking back to this issue https://declaredesign.org/r/estimatr/articles/stata-wls-hat.html