Panel Data Econometrics with R
Insufficient Number of Instruments Error When Doing IV Regression with Unbalanced Panel even though the regression does fine in Stata #49

mcket747econ commented 1 year ago

I am attempting to run the following IV regression on an unbalanced panel dataset. The variables TOTAL_E, TOURISM_5k_SUM and TOURISM_10K_SUM are endogenous, while HHSIZE2 and SEX are exogenous explanatory variables, and Z_b_100,tourism_5KM_ZSCORE and tourism_10KM_ZSCORE are instruments. I want to include household characteristics as controls in this regression. I do so and am also sure to include them in the formula with the IVS, but every time I still get the insufficient number of instruments error. I am using a random effects model since my instrument is time invariant so I don't believe that is the issue. Based on this question:https://stackoverflow.com/questions/56672684/error-insufficient-number-of-instruments-when-running-plm-iv-regression, I am supposed to have 2 instruments for every endogenous variable but that isn't possible in my case so I am a bit stuck. The regression runs fine in STATA. I will add the STATA output below. Any advice would be appreciated!

random <- plm(asinh(AECAPITA)~asinh(TOTAL_E) +asinh(TOURISM_5KM_SUM)+ + HHSIZE2 + SEX|Z_b_100 + HHSIZE2 ++ tourism_10KM_ZSCORE+ SEX ,data=df,index=c("UNIQUE_HH_ID"), model="random")

Error in plm.fit(data, model = models[1L], effect = effect) : 
  insufficient number of instruments
In addition: Warning message:
In pdata.frame(data, index) :
  duplicate couples (id-time) in resulting pdata.frame
 to find out which, use, e.g., table(index(your_pdataframe), useNA = "ifany")
. gen log_total_e_10 = log(TOURISM_10KM_SUM)
(8,229 missing values generated)

. do "C:\Users\mcket\AppData\Local\Temp\STD728_000000.tmp"

. . xtivreg AE_CAPITA_US SEX HHSIZE2 ( TOTAL_E TOURISM_10KM_SUM = Z_b_100 tourism_10KM_ZSCORE), re

G2SLS random-effects IV regression              Number of obs     =     11,374
Group variable: UNIQUE_HH_ID                    Number of groups  =      5,293

R-squared:                                      Obs per group:
     Within  = 0.0044                                         min =          1
     Between = 0.0546                                         avg =        2.1
     Overall = 0.0338                                         max =         12

                                                Wald chi2(4)      =    1011.76
corr(u_i, X) = 0 (assumed)                      Prob > chi2       =     0.0000

    AE_CAPITA_US | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
         TOTAL_E |  -.0088191   .0015754    -5.60   0.000    -.0119068   -.0057314
TOURISM_10KM_SUM |   .0010611   .0003845     2.76   0.006     .0003076    .0018147
             SEX |   247.1207   67.98776     3.63   0.000     113.8672    380.3743
         HHSIZE2 |  -490.9899   15.70356   -31.27   0.000    -521.7683   -460.2115
           _cons |   5172.889   126.4606    40.91   0.000     4925.031    5420.747
         sigma_u |          0
         sigma_e |  73042.599
             rho |          0   (fraction of variance due to u_i)
Exogenous:  SEX HHSIZE2 Z_b_100 tourism_10KM_ZSCORE
tappek commented 1 year ago

I see you posted this to StackOverflow already (https://stackoverflow.com/q/76808363/4640346) and there is an answer on Stackoverflow already pointing to the warning message you receive in addition. I can second that this warning is to be addressed first to feed appropriate data into function plm() and it is typically easier to create the `pdata.frame´ first.

The code you gave here diverges from the one on SO in at least two points:

tappek commented 11 months ago

Closing this now due to assuming that treating the panel dimensions and formula correctly will show no error then. Feel free to re-open when a reproducible example can be supplied.