sergiocorreia / ivreghdfe

Run IV/2SLS with many levels of fixed effects (i.e. ivreg2+reghdfe)
MIT License
77 stars 27 forks source link

Using partial option with ivreghdfe #6

Open tatyanaderyugina opened 6 years ago

tatyanaderyugina commented 6 years ago

Perhaps more of a question than an issue. I'm running regressions of the form

ivreghdfe Y (X = Z) , absorb(i.F) cluster(C)

The number of fixed effects is very large (thus the use of reghdfe) and I get a warning because the number of clusters is smaller than the number of fixed effects. F-statistics also do not get reported, which is clearly problematic for IV. It does say that the use of the "partial" option may help fix this problem. However, the way partial() works in ivreg2 is that the user has to generate the variables to be partialed out, i.e., I cannot write "partial(i.F)" (I tried, ivreghdfe breaks down). But if I generate the indicators (let's call the set _F*), and run

ivreghdfe Y (X = Z) , absorb(i.F) partial(_F*) cluster(C)

it doesn't seem like there's any way reghdfe will recognize that i.F and _F* are the same variable. Thus, it seems like if I want F-statistics in this case, I just need to use ivreg2 and suffer through the slowness?

The older version of reghdfe on ssc used to report F-stats for IV regressions in these cases, so perhaps it wouldn't be difficult to bring them back?

sergiocorreia commented 6 years ago

Hi Tatyana,

F-statistics also do not get reported, which is clearly problematic for IV

I'm not sure I follow. On the first stage of an IV people often report F-Stats, but of the instruments (not the entire set of covariates), and these should still be computable unless you have many instruments or almost no cluster units.

I get a warning because the number of clusters is smaller than the number of fixed effects

I just tried a toy example with Stata's default dataset, and it seems that the issue is not that the number of clusters is smaller than the number of FEs, but than it's smaller than the number of included regressors:

sysuse auto

* 14 FEs, one regressor, cluster of 2 = runs fine
ivreghdfe price (gear=length), a(turn) cluster(foreign)

* 14 FEs, two regressors, cluster of 2 = warning
ivreghdfe price weight (gear=length), a(turn) cluster(foreign)

It does say that the use of the "partial" option may help fix this problem

Indeed. But as you pointed out, you only partial included regressors and not FEs. For instance, in the previous example, we can partial out the -weight- variable to remove the warning:

* If we partial -weight- we don't get a warning anymore
ivreghdfe price weight (gear=length), a(turn) cluster(foreign) partial(weight)

The older version of reghdfe on ssc used to report F-stats for IV regressions in these cases, so perhaps it wouldn't be difficult to bring them back?

That's true. If you run reghdfe with the old option the program would actually run the SSC version:

reghdfe price weight (gear=length), a(turn) cluster(foreign) old

However, in the example above I still get a warning with the SSC version. Finally, I remember the new version improved some corner cases regarding when the F stat was shown (or not shown), so I would stick with the new version and just use partial().

Best, Sergio

tatyanaderyugina commented 6 years ago

Thank you for the reply, Sergio! But literally all the non-instrument controls are absorbed, so there's nothing left to partial out. And the number of instruments, while large, is still smaller than the number of clusters.