ivreghdfe reporting of F-statistics when instruments are indicator variables

tatyanaderyugina commented 4 years ago

Hi Sergio,

My first stage involves instruments that are interaction terms in the form i.Z1#i.Z2, where Z1 is collinear with some of the absorbed fixed effects. When I run the ivreghdfe regressions, I get the following warnings:

warning: -ranktest- error in calculating underidentification test statistics; may be caused by collinearities

Warning: estimated covariance matrix of moment conditions not of full rank. overidentification statistic not reported, and standard errors and model tests should be interpreted with caution. Possible causes: singleton dummy variable (dummy with one 1 and N-1 0s or vice versa) partial option may address problem.

The results of the warning is that ivreghdfe does not report any F-statistics. I dug deeper into the cause of this problem, and I think ivreghdfe is not correctly detecting collinearities between the instruments and the fixed effects (possibly just when calculating the F-statistic?). Specifically, I do NOT have this problem when I:

Use the "old" option
Generate the interactions terms manually AND omit the correct number of terms from the first stage (but this is obviously not practical for large numbers of instruments/large datasets)
Use ivreg2

3 and 1 make me think that the problem is now ivreg2 itself but how it's implemented with ivreghdfe. I'm attaching a minimal working example of the problem as well as a log file of the results. I've verified that this is still a problem using the latest version of ivregdfe from Github (downloaded today) as well as what's on ssc.

reghdfe_problem.txt ivreghdfe_problem_log.txt

sergiocorreia commented 4 years ago

Hi Tatyana,

First of all, which version of reghdfe are you using?

What I suspect is going on is this:

Recall that ivreghdfe is just ivreg2 where the variables are first demeaned through reghdfe.
If a variable X1 is fully collinear with the fixed effects, the residual is not necessarily ZERO (0.0000) but could be something trivially small (within an epsilon of zero).
That could mess up the ivreg2 code that detects collinearity. So ivreg2 doesn't drop the variable and then when it tries to compute the FStat it fails (I had to tweak the reghdfe code a lot to prevent that same problem with reghdfe).

All in all, I'm not sure if there is an easy answer, as ivreg2 is a complex piece of code and I would prefer not hacking it too much beyond what I have done.

tatyanaderyugina commented 4 years ago

Hi Sergio,

This test was done using version 5.7.3 13nov2019, but I've seen this pattern in ivreghdfe for a while now. Your explanation makes a lot of sense, I'm guessing that's what's going on (though I will check in the simulation to be completely sure).

One relatively straightforward solution might be to add a "residual tolerance" option to ivreghdfe where residual values within the tolerance value of zero are set to be exactly zero.

eloualiche commented 3 years ago

Hi all,

I think this is related to an issue we uncovered using FixedEffectModels. I have added some code to illustrate our patch and see how Julia did not use to work (before version 1.4.2) and now does do the right thing. This has to do with how the ranktest statistics is computed for a specific case (see this pr)

The code to check everything (could be useful as a test to implement it in stata) is here.
Notebook that explains how it is actually an error on the stata side (and formerly on the Julia side) here

I think this is actually an issue that affects both ivreghdfe and ivreg2 so I am wondering if it's not worth opening a separate issue (and contact the author of ivreg2).

ht @matthieugomez & Valentin Haddad

sergiocorreia commented 3 years ago

I think this is actually an issue that affects both ivreghdfe and ivreg2 so I am wondering if it's not worth opening a separate issue (and contact the author of ivreg2).

What would be really useful is a minimal working example (something simpler i.e. without the interactions). Then we can try to come up with an example in ivreg2 (unless you have one? but this made me think it's not the case), and then we can contact the authors (Kit Baum, Mark Schaffer, Steven Stillman).

sergiocorreia commented 3 years ago

Also, I'm really low on time these days (too many revisions and late RRs) so I might be a bit slow to reply; apologies in advance

eloualiche commented 3 years ago

Thank you Sergio. We are not pressed for time as we mostly use julia. We thought it would be worth bringing the issue to the attention of stata users ;)

The julia notebook example fails for both ivreg2 and ivreghdfe. I think the example is pretty compact (20 rows) and fairly common. I assume ivreghdfe relies on ivreg2 for the ranktest, so maybe it's worth reaching out to the authors.

sergiocorreia commented 3 years ago

Thanks, that's really useful! I'll put this in my to-do list and also try to contact the ivreg2 authors.

BTW, do you have a specific goal (teaching a course?) or are just trying to ensure compatibility across different tools?

eloualiche commented 3 years ago

Sergio,

As I said Matthieu tests his package against stata. So when we found strange results in the F-stats we looked at stata to see where the errors came from.

I personally don't use stata much. I have added an R section at the bottom of the Julia notebook that shows similar issues (both @lrberge fixest and lfe).

If you email the ivreg2 authors, please cc us. It will probably make it easier to explain what is the specific case that has to be handled (it is not a simple a case of just detecting collinearity).

lrberge commented 3 years ago

Hi and thanks for tagging me on this issue. But actually the ranktest was not implemented in fixest at that time! :-)

Eventually I did implement it. But I gave up trying to translate the paper to an algorithm (I'm really impressed you achieved that!), and I ended up translating the ranktest.jl algo into fixest.

By the way the ranktests differ (importantly) between Stata and Julia (and hence fixest) when # inst. var. > # endo. var (when equality occurs, results are identical). I'm sorry to just mention it and leave it there but I tried without success to solve the problem (there are just too many implicit things in the paper for me to understand what's going on).

sergiocorreia / ivreghdfe

ivreghdfe reporting of F-statistics when instruments are indicator variables #25