statsmodels / statsmodels

Statsmodels: statistical modeling and econometrics in Python
http://www.statsmodels.org/devel/
BSD 3-Clause "New" or "Revised" License
10.11k stars 2.88k forks source link

ENH: cov_type for nonlinear two-stage models #8803

Open josef-pkt opened 1 year ago

josef-pkt commented 1 year ago

Mainly parking a issue and references. I was just skimming parts.

In treatment effect we use GMM for cov_params which is a robust cov_type. If we want nonrobust cov_type, then we need to skip some of the robust computations.

Based on my skimming there are different versions of the two-stage cov_params, either nonrobust (correct specification) or robust (sandwiches for some misspecification):

I don't know what (our) heckman uses.

Hole, Arne Risa. “Calculating Murphy–Topel Variance Estimates in Stata: A Simplified Procedure.” The Stata Journal 6, no. 4 (November 1, 2006): 521–29. https://doi.org/10.1177/1536867X0600600405.

Palmer, Tom M, Michael V Holmes, Brendan J Keating, and Nuala A Sheehan. “Correcting the Standard Errors of 2-Stage Residual Inclusion Estimators for Mendelian Randomization Studies.” American Journal of Epidemiology 186, no. 9 (November 1, 2017): 1104–14. https://doi.org/10.1093/aje/kwx175.

Terza, Joseph V. “Simpler Standard Errors for Two-Stage Optimization Estimators.” The Stata Journal 16, no. 2 (June 1, 2016): 368–85. https://doi.org/10.1177/1536867X1601600206.

Newey, Whitney K. “A Method of Moments Interpretation of Sequential Estimators.” Economics Letters 14, no. 2 (January 1, 1984): 201–6. https://doi.org/10.1016/0165-1765(84)90083-1. formula for method of moments, exactly identified GMM, using sandwiches for all parts.

(not clear to me yet what we need)

(*) update Murphy, Topel, section 5.1 two-step MLE equ. (29) assumes and specifies information matrix equality, R_i is name for both. In the following, they use R_i and so do not specify whether OPG or hessian is used in their final formula equ. (34) (AFAICS, in equ (33) R still refers to negative hessian, i.e. second derivatives, and omega in equ (30) and (31) is cov(score), i.e. using R for opg.)

josef-pkt commented 1 year ago

We need helper functions, at least for Newey/GMM and Murphy and Topel

There should be some overlap with statsmodels.stats._diagnostic_other, e.g conditional_moment_test_generic

Actually, I'm not sure we really need to use it. It's mainly computational if we have a large number of moment conditions. We can just use the appropriate submatrix/block of the joint cov_params instead of using partitioned matrix inverse. (one large matrix inverse instead of many computations with smaller matrices)