Reported first-stage F statistics in IV estimations different from "lfe"

pei-huang commented 3 years ago

Hi Laurent,

Thank you very much for creating this excellent package for the community. It is super helpful for my research.

By using feols in the fixest package, I notice that the first-stage F statistics reported in fixest are much larger than those reported in felm from the lfe package, labeled as F statistics (excl. instr.) in the summary output. Are these two the same? Why are they so significantly different?

Thanks, Pei Huang

lrberge commented 3 years ago

Hi! I'm glad you like the package! :-)

A few words of explanation. In fixest:

F-stat (stage 1), is the basic F-test, the Cragg-Donald stat. That's what's reported by default.
fitstat(est, "ivwald") gives the Wald test with the error structure assumed to be the same as in est. That's the Kleibergen-Paap Wald test which is robust to heteroskedasticity (depending on the VCOV of est or the other arguments passed to fitstat).

When I look at lfe, here's what I get:

library(lfe)
library(fixest)

base = iris
names(base) = c("y", "x1", "x2", "x3", "species")

est_felm  =  felm(y ~ x1 | 0 | (x2 ~ x3), base)
est_feols = feols(y ~ x1 | x2 ~ x3, base)

est_felm  =  felm(y ~ x1 | species | (x2 ~ x3) | species, base)
est_feols = feols(y ~ x1 | species | x2 ~ x3, base)

# Cragg-Donald
est_felm$stage1$iv1fstat$x2
#>          p       chi2        df1        p.F          F        df2 
#> 0.06946549 3.29560557 1.00000000 0.21112208 3.29560557 2.00000000 
#> attr(,"formula")
#> ~x3
#> <environment: 0x0000029e97f6c3e0>
fitstat(est_feols, "ivf")
#> F-test (1st stage): stat = 23.8, p = 2.808e-6, on 1 and 145 DoF.

# Kleibergen-Paap
waldtest(fe_est_felm$stage1, ~x3)
#>          p       chi2        df1        p.F          F        df2 
#> 0.06946549 3.29560557 1.00000000 0.21112208 3.29560557 2.00000000 
#> attr(,"formula")
#> ~x3
#> <environment: 0x0000029e987f0488>
fitstat(fe_est_feols, "ivwald")
#> Wald (1st stage): stat = 3.2956, p = 0.071532, on 1 and 145 DoF, VCOV matrix: Clustered (species).

So the two packages lead to similar results. So maybe you were comparing the Cragg-Donald statistic to the Kleibergen-Paap statistic? Maybe I should give explicit names in the output...

Could you be more specific and give an example of what you have in mind?

pei-huang commented 3 years ago

Hi Laurent,

Thank you for your clarification! You were right - I was comparing apples (the Cragg-Donald stat) to oranges (the Kleibergen-Paap stat)...

Now I understand that the est_felm$stage1$iv1fstat$x2 from felm reports the Kleibergen-Paap stat, while fitstat(est_feols, "ivf") from fixest reports the Cragg-Donald stat.

It would be great that if these statistics have explicit names in the summary output if it's not a heavy task for you!

May I also ask whether fixest can report the Anderson-Rubin Wald test and the Stock-Wright LM S statistic, like the stata package reghdfe does?

Thanks, Pei

pei-huang commented 3 years ago

Hi Laurent,

I have a bit update on the first-stage F statistics. It seems like when we have more than two endogenous variables in an IV regression, the results are different between fixest and lfe. For example, when there are two endogenous variables, the first-stage F statistics (ivwald) for the second endogenous variable from fixest is quite different from that from lfe.

Below is an example of regression results from the two packages (using my own dataset):

> fitstat(model.fixest, "ivwald")
Wald (1st stage), CO : stat =  96.4, p < 2.2e-16, on 6 and 1,768,700 DoF, VCOV matrix: Two-way (ID.FE & admdate).
Wald (1st stage), NO2: stat = 191.8, p < 2.2e-16, on 6 and 1,768,700 DoF, VCOV matrix: Two-way (ID.FE & admdate).

> model.lfe$stage1$iv1fstat
$CO
            p          chi2           df1           p.F             F           df2 
1.005556e-121  5.785113e+02  6.000000e+00  5.864275e-88  9.641855e+01  6.930000e+02 
attr(,"formula")
~Ship.Proj | Wnds.Mean | Wndd.Diff.U | `Ship.Proj:Wndd.Diff.U` | 
    `Ship.Proj:Wnds.Mean` | `Wnds.Mean:Wndd.Diff.U`
<environment: 0x7fc8d702ef98>

$NO2
          p        chi2         df1         p.F           F         df2 
  0.9873535   0.9523731   6.0000000   0.9872797   0.1587289 693.0000000 
attr(,"formula")
~Ship.Proj | Wnds.Mean | Wndd.Diff.U | `Ship.Proj:Wndd.Diff.U` | 
    `Ship.Proj:Wnds.Mean` | `Wnds.Mean:Wndd.Diff.U`
<environment: 0x7fc8c5085a28>

I'm not sure what causes this disparity. Maybe it's an issue of lfe since F & Chi2 statistics are quite small for the second endogenous variable.

Thanks, Pei

lrberge commented 3 years ago

Very sorry: I've just re-read my answer and it was very confusing!!!! I really thought the direct access via est_felm$stage1$iv1fstat$x2 for lfe led to the Cragg-Donald, but this was not the case! I don't know how to access the CD stats in lfe. (It's because it worked for a non-posted example without clustering and I didn't see that it didn't work for the example with clustering that I posted.)

Could you double check with Stata to understand the difference btw the different KP values?

By the way, I think I need to adjust the DoF differently when clustering is performed. I'll fix this. Thanks for highlighting this issue!

On naming. I'll see what I can do, but I like to keep names short so I don't feel like appending the CD/KP names to the statistics. But I don't feel either to drop the name of the statistic and just keep the names. Agreed it can improve cross-software comparability (which is a great thing), so maybe I'll just use the names. => I'll ponder that! :-)

On the extra stats. Thanks for mentioning them. I really can't at the moment. But I'll try to write a contributing guide to make it simple for contributors to add new stats (because although the architecture of the software is complex, adding extra stats is really standalone).

pei-huang commented 3 years ago

Hi Laurent,

No problem about the KP and CD stats. I figured it out myself. It seems like lfe doesn't report CD first-stage F. It's great fixest reports both!

Yes, I'll check the differences for the KP statistics across packages and even Stata. My initial runs show that lfe, fixest, and reghdfe all report different first-stage statistics. I'll keep you posted.

It sounds great that you can provide a guide for people who can contribute to this package. Look forward to that!

Pei

pei-huang commented 3 years ago

Hi Laurent,

There seems another problem of reporting first-stage F statistics in fixest. When there are collinear variables that are removed automatically, fitstat(est, "ivwald") generates the following error:

Error in my_x_first$cov.scaled[inst, inst] : subscript out of bounds

I'm still trying to understand the results from lfe, fixest, and Stata's reghdfe. I'll let you know the udpates.

Pei

lrberge commented 3 years ago

Hi Pei,

Thanks again for the report!

Two things:

I've just corrected the bug you mentioned,
I took the opportunity to overhaul how I compute the DoFs internally, so that when SEs are clustered, the second DoF in the Wald is the number of groups used for clustering (it actually depends on the dof argument used when clustering the SEs but that's... complicated :-)).

This means that the Wald test pvalues should be in line with lfe's now:

base = iris
names(base) = c("y", "x1", "x2", "x3", "species")

fe_est_felm  =  felm(y ~ x1 | species | (x2 ~ x3) | species, base)
fe_est_feols = feols(y ~ x1 | species | x2 ~ x3, base)

waldtest(fe_est_felm$stage1, ~x3)
#>          p       chi2        df1        p.F          F        df2 
#> 0.06946549 3.29560557 1.00000000 0.21112208 3.29560557 2.00000000 
#> attr(,"formula")
#> ~x3
#> <environment: 0x00000259380801f8>
fitstat(fe_est_feols, "ivwald")
#> Wald (1st stage), x2: stat = 3.2956, p = 0.211122, on 1 and 2 DoF, VCOV: Clustered (species).

By the way, I apologize again for my first reply! the code really made absolutely no sense... Thanks for getting over it! ;-)

adviksh commented 3 years ago

On the extra stats. Thanks for mentioning them. I really can't at the moment. But I'll try to write a contributing guide to make it simple for contributors to add new stats (because although the architecture of the software is complex, adding extra stats is really standalone).

@lrberge I'm happy to take a swing at coding some of these stats during the summer. I've found fixest extremely helpful for my work, and would love to try giving back!

lrberge commented 3 years ago

@adviksh: that's wonderful news! :-) I'll try to write a guide (and find a design) to make it very simple in the coming months.

@pei-huang: just to clarify: the Cragg-Donald and the KP tests are not currently implemented but in the case of 1 endogenous regressor, they're the same as the tests already reported. FYI I've just written the CD test for multiple endo. regs. and I'm close to be done for the KP. So it should be there soon (<1 month).

pei-huang commented 3 years ago

@lrberge: great about the progress. Thank you for your work.

So in the current version, the first-stage F statistics for multiple endogenous variables are not either CD or KP?

lrberge commented 3 years ago

The F/Wald tests in the packages test the joint significance of the instruments in each first stage regression. So you end up with n_endo such tests.

On the other hand, the CD/KP tests are overall tests, so even if you have n_endo > 1, you end up with only one statistics.

However, when you have only one endogenous regressor, the pairs CD/F and KP/Wald tests are equivalent.

pei-huang commented 3 years ago

Fantastic! Thanks!

pei-huang commented 3 years ago

@lrberge: thank you for implementing the kpr and cd tests for IV estimations! By working with them, I think I found some issues of these tests pertaining to collinearity, similar to the ivwald test I reported before. Below is some code to replicate the errors:

base = iris
names(base) = c("y", "x1", "x_endo_1", "x_inst_1", "fe")
set.seed(2)
base$x_inst_2 = 0.2 * base$y + 0.2 * base$x_endo_1 + rnorm(150, sd = 0.5)
base$x_endo_2 = 0.2 * base$y - 0.2 * base$x_inst_1 + rnorm(150, sd = 0.5)
base$x_inst_3 = base$x_inst_2 + 10

## estimation without collinear terms
est_iv_original = feols(y ~ x1 | fe | x_endo_1 + x_endo_2 ~ x_inst_1 + x_inst_2, data = base, cluster = "fe")

fitstat(est_iv_original, "kpr")
#> Kleibergen-Paap: stat = 0.676054, p = 0.410949, on 1 DoF.
fitstat(est_iv_original, "cd")
#> Cragg-Donald: 4.7242

## estimation with collinear terms
est_iv_collinear = feols(y ~ x1 | fe | x_endo_1 + x_endo_2 ~ x_inst_1 + x_inst_2 + x_inst_3, data = base, cluster = "fe")
#> The variable 'x_inst_3' has been removed because of collinearity (see $collin.var).
#> The variable 'x_inst_3' has been removed because of collinearity (see $collin.var).

fitstat(est_iv_collinear, "kpr")
#> Error in Fmat %*% t(solve(t(Gmat)) %*% t(PI)) : non-conformable arguments
fitstat(est_iv_collinear, "cd")
#> Cragg-Donald: 3.1277

The above results show that when there is collinearity, the KP test returns an error, and the CP test reports a different value.

I also tried the KP test with my own dataset (it is rather large). The presence of collinearity also produces an error for the KP test but with a different error message. See below:

fitstat(model.fixest, "kpr")
#< Error in chol.default(crossprod(Z_proj)) : 
#<  the leading minor of order 4 is not positive definite

If you want to reproduce the error message, I can share the data and code with you.

To compare with Stata, I implemented the model est_iv_original using the ivreghdfe command. The KP and CD test results seem to be different from those reported in fixest.

ivreghdfe y x1 (x_endo_1 x_endo_2 = x_inst_1 x_inst_2), absorb(fe) cluster(fe) first

* CD and KP test results
*> Weak identification test
*> Ho: equation is weakly identified
*> Cragg-Donald Wald F statistic                                       4.76
*> Kleibergen-Paap Wald rk F statistic                                 4.36

lrberge commented 3 years ago

Hi, actually they are not finalized yet, that's why I didn't get back to you ;-)

The KP is a real nightmare. I think I'm close to be done for the case with n_endo = n_inst, but when n_endo < n_inst, the results diverge from Stata and I don't know the cause (in fact I ended up translating the code from Julia's Vcov.jl, which is affected by the same problem). I think I will leave it there because it's just too painful. Maybe a good soul can take over from there?

The CD should be OK now.

Currently, the reproducible example now works and results are similar to Stata. I've also fixed the collinearity message (which was ugly in this context).

pei-huang commented 3 years ago

@lrberge thank you for the reply! I didn't mean to rush you. I saw you updated the git repository the other day, so I tried it for a bit. I thought the error messages might help you debug the package. Sorry about it!

We totally understand the situation - packages across platforms may diverge for certain results because we may not know what other packages exactly do in their code. I really appreciate your efforts in implementing the tests and also correcting the error messages.

Best!

lrberge commented 3 years ago

Actually that was useful and it did help me! :-) I wouldn't have caught the collinearity problem otherwise.

I think I'm closing this issue now. Don't hesitate to come back to me if needed!

lrberge / fixest

Reported first-stage F statistics in IV estimations different from "lfe" #117