Using absorb(id) or i.id does not lead to the same results

eloualiche commented 3 years ago

I have been using some of the code from https://github.com/vikjam/mostly-harmless-replication as test for Julia's FixedEffectModels.

It appears that running a model with fixed effects using absorb or not does not always give the same results. This might be due to how each of these deal with the collinearity of variables.

The code that follows is adapted from https://github.com/vikjam/mostly-harmless-replication/blob/master/04%20Instrumental%20Variables%20in%20Action/Table%204-1-1.do

/* Download the data */
* shell curl -o asciiqob.zip http://economics.mit.edu/files/397
* unzipfile asciiqob.zip, replace

/* Read the data */ 
infile lwklywge educ yob qob pob using asciiqob.txt, clear

/* Regression using i.id */
ivreghdfe  lwklywge  i.yob i.pob (educ = i.qob#i.yob), robust
/* Regression using absorb(id) */
ivreghdfe  lwklywge  (educ = i.qob#i.yob), absorb(yob pob) robust

Both regressions output the correct coefficient on educ. The first-stage F-statistics though is estimated in the first case (Cragg-Donald F is 5.364, Kleibergen-Paap F is 5.234). In the second case both F-statistics are missing and the estimation throws a warning about collinearities.

Is this expected behavior?

sergiocorreia commented 3 years ago

Hi Erik,

If you run ivreghdfe with no absorb() option, you are essentially just running ivreg2, so your results match those. However, when you do include absorb(), two things happen:

Results are partialled out
ivreg2 is called with the small option

From ivreg2's help file:

small requests that small-sample statistics (F and t-statistics) be reported instead of large-sample statistics (chi-squared and z statistics). Large-sample statistics are the default. The exception is the statistic for the significance of the regression, which is always reported as a small-sample F statistic.

When I run your two commands, I get a S.E. of .0158736 with absorb and .0158721 without absorb. Now let's run ivreg2:

ivreg2 lwklywge  i.yob i.pob (educ = i.qob#i.yob), robust // .0158721
ivreg2 lwklywge  i.yob i.pob (educ = i.qob#i.yob), robust small // .0158736
ivreghdfe  lwklywge  (educ = i.qob#i.yob), absorb(yob pob) robust small // .0158736

So it seems that the estimates of educ are the same if you add the small option.

On your second point, I can verify that the C-D and K-P are missing. That's done within ivreg2 so I don't know that much about their details, but I suspect that, as you say, some of the i.gob#i.yob end up as a vector of zeroes after partialling out.

Running reghdfe does show that all the 4.gob#i.yob regressors are collinear (which makes sense, b/c you are absorbing for yob). This suggests you might want to drop that category altogether, something like this:

reghdfe educ lwklywge i.qob#i.yob, absorb(yob pob)
ivreghdfe  lwklywge  (educ = i(1 2 3).qob#i.yob), absorb(yob pob) robust small

But then I'm not entirely sure why the estimates change; would need to dig a bit more.

eloualiche commented 3 years ago

Thank you Sergio.

As I said, we ran into similar issues of collinearities in FixedEffectModels. Since we use stata for testing I wanted to bring this to your attention. So I agree it's not something easy to deal with.

I am not sure what's the best way to deal with this. If it's not fixable, maybe the warning could suggest to run the code without absorb(id) and using i.id? I understand both commands do different things "under the hood" but I think people generally consider them to be the same.

sergiocorreia / reghdfe

Using absorb(id) or i.id does not lead to the same results #226