sergiocorreia / reghdfe

Linear, IV and GMM Regressions With Any Number of Fixed Effects
http://scorreia.com/software/reghdfe/
MIT License
214 stars 56 forks source link

Twice Robust for clustering and wmatrix #51

Open achenzion opened 8 years ago

achenzion commented 8 years ago

I seem to be getting unclear errors when trying to use wmatrix(cluster var) for both gmm and 2sls IV. How does the option twicerobust work when you specify vce(cluster var)?

reghdfe Y X1 (X2 = Z), absorb(T1 T2) est(2sls) vce(cluster C) stages(first reduced) suboptions(wmatrix(cluster C)) ivsuite(ivregress)

>cannot specify wmatrix() with 2SLS estimator

reghdfe Y X1 (X2 = Z), absorb(T1 T2) est(gmm) vce(cluster C) stages(first reduced) suboptions(wmatrix(cluster C)) ivsuite(ivregress)

>option wmatrix() not allowed

I most likely am misunderstanding the documentation. Any help would be appreciated.

Thank you,

Ayal

sergiocorreia commented 8 years ago

The twicerobust option of the help file sheds some light about why this doesn't work.

AFAIK, ivregress with wmatrix will run two regressions (I forgot how exactly robust and wmatrix played together, but the helpfile links the PDF, and I recall I had some extra references about it somewhere).

The problem is that reghdfe only runs once, and to make reghdfe fully compatible with running two regressions (or more, for iterated gmm) will require changing the code of ivregress (which I can't do) in order to recompute the means in the middle.

It might be possible for you to compute the GMM by hand (by using hdfe.ado to do the demeaning, then running the first step, then recomputing, and so on), but sadly there is no automated way to do it.

achenzion commented 8 years ago

Ok, yes. I see the issue.

See: http://www.stata.com/manuals13/rivregress.pdf (pg 7-10, 13-15)

To be clear with or without wmatrix() ivregress with two-step gmm runs two regressions. (see pg. 13)

  1. The first is the standard 2SLS IV regression and the residuals from this first step are taken to calculate the weighting matrix (either unadjusted, robust, clustered, etc) for...
  2. The gmm version of the IV regression which is reported with errors that are subject to the vce() command.

If the wmatrix() option is not specified then it is using the residuals from the first step assuming homoskedasticity.

You say that that reghdfe only runs once, but then how is it implementing both steps of two-step efficient gmm? I do not see any reason to be recomputing between the first and second step outlined above. Please let me know if I misunderstood. Thank you for your response.

sergiocorreia commented 8 years ago

It is not implementing both steps.

From the source code, what reghdfe does is:

Now, this is not strictly correct, as if ivregress is running two regressions we also want to run two demeanings (one for every regression).

On the other hand, the error introduced by demeaning twice appeared to be trivial, and one of the authors of ivreg2 suggested that there might still be better to include this option and have it "half baked" that to exclude it.

Let me know if you find that the difference in results wrt doing it the long way is significant. In my checks it wasn't but if I find cases where it is misleading I might remove it.