lrberge / fixest

Fixed-effects estimations
https://lrberge.github.io/fixest/
362 stars 59 forks source link

Negative R squared in IV regression #407

Closed simonschoe closed 5 months ago

simonschoe commented 1 year ago

Hi there,

I am currently observing an odd result in my analysis which I cannot quite figure out. That is, the second stage in an IV setup yielding negative values for R^2. This occurs in partciular if I perform splits before fitting.

Reprex (here it only occurs within one subsample, in my data this behaviour is consistent across all splits):

base = iris
names(base) = c("y", "x1", "x_endo_1", "x_inst_1", "fe")
set.seed(2)
base$x_inst_2 = 0.2 * base$y + 0.2 * base$x_endo_1 + rnorm(150, sd = 0.5)
base$x_endo_2 = 0.2 * base$y - 0.2 * base$x_inst_1 + rnorm(150, sd = 0.5)

est_iv_fe = feols(y ~ x1 | x_endo_1 + x_endo_2 ~ x_inst_1 + x_inst_2, base, split = ~ fe)
etable(summary(est_iv_fe, stage = 1:2), fitstat = ~ . + ivf1 + ivf2 + ivwaldall.p)
                             summary(est_iv..1 summary(est_i..2 summary(est_iv_..3 summary(est_iv..4 summary(est_i..5 summary(est_..6 summary(est_iv_..7 summary(est_iv..8 summary(est_iv..9
Sample (fe)                                                                 setosa                                         versicolor                                              virginica
Dependent Var.:                       x_endo_1         x_endo_2                  y          x_endo_1         x_endo_2               y           x_endo_1          x_endo_2                 y

Constant                     1.163*** (0.2168)  0.5953 (0.6132)    1.889* (0.8699) 1.641*** (0.3791)   1.233 (0.7366)  0.0502 (4.058)  2.800*** (0.6116)   0.3666 (0.7442)    1.092 (0.8364)
x_inst_1                      0.4567. (0.2388) -0.3476 (0.6755)                    1.707*** (0.2945) -0.6270 (0.5722)                    0.1393 (0.2697) -0.6644* (0.3281)                  
x_inst_2                       0.0382 (0.0432)  0.1498 (0.1221)                      0.0561 (0.0711)  0.0691 (0.1380)                 0.4680*** (0.1059)  0.3438* (0.1289)                  
x1                             0.0395 (0.0654)  0.0534 (0.1850) 0.6434*** (0.1155)   0.0883 (0.1813)  0.1721 (0.3523) 0.1131 (0.6892)   0.4343. (0.2290)   0.3619 (0.2786)  0.3247. (0.1884)
x_endo_1                                                           0.6985 (0.6815)                                    0.9716 (0.7911)                                      0.7780** (0.2239)
x_endo_2                                                          -0.1227 (0.4534)                                      1.412 (2.247)                                        0.2187 (0.2663)
____________________________ _________________ ________________ __________________ _________________ ________________ _______________ __________________ _________________ _________________
S.E. type                                  IID              IID                IID               IID              IID             IID                IID               IID               IID
Observations                                50               50                 50                50               50              50                 50                50                50
R2                                     0.13533          0.03683            0.48218           0.62652          0.02735         -1.3028            0.42201           0.18780           0.76124
Adj. R2                                0.07894         -0.02599            0.44841           0.60216         -0.03609         -1.4530            0.38432           0.13483           0.74567
F-test (1st stage)                      2.7599          0.77963                 --            19.234          0.63191              --             10.393            4.9656                --
F-test (1st stage), x_endo_1                --               --             2.7599                --               --          19.234                 --                --            10.393
F-test (1st stage), x_endo_2                --               --            0.77963                --               --         0.63191                 --                --            4.9656
F-test (2nd stage)                          --               --            0.62585                --               --          3.7087                 --                --            6.0397
Wald (IV only), p-value                0.07379          0.46454            0.59340            8.5e-7          0.53613         0.37455            0.00019           0.01115           5.82e-6
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

In addition, the IV formula extensions x_endo_1 + x_endo_2 ~ x_inst_1 + x_inst_2 implies that both instruments are used for both endogenous covariates. Is it possible to use the first instrument only for the first endogenous regressor (vice versa for the second)?

Thanks for your great work with the package!

kylebutts commented 1 year ago

Totally normal for R^2 to be negative in IV/2SLS:

"R2 really has no statistical meaning in the context of 2SLS/IV."

Source: https://www.stata.com/support/faqs/statistics/two-stage-least-squares/

For your second question, that is not how IV works. You use all instruments to instrument for all Xs :-)

lrberge commented 5 months ago

Hello! Indeed, as Kyle mentions, R2s in IV regressions don't make sense. I may drop it completely TBH because it creates confusion. Thanks for the words :-)