ycroissant / plm

Panel Data Econometrics with R
GNU General Public License v2.0
49 stars 13 forks source link

Insufficient Number of Instruments Error When Doing IV Regression with Unbalanced Panel even though the regression does fine in Stata #49

Closed mcket747econ closed 11 months ago

mcket747econ commented 1 year ago

I am attempting to run the following IV regression on an unbalanced panel dataset. The variables TOTAL_E, TOURISM_5k_SUM and TOURISM_10K_SUM are endogenous, while HHSIZE2 and SEX are exogenous explanatory variables, and Z_b_100,tourism_5KM_ZSCORE and tourism_10KM_ZSCORE are instruments. I want to include household characteristics as controls in this regression. I do so and am also sure to include them in the formula with the IVS, but every time I still get the insufficient number of instruments error. I am using a random effects model since my instrument is time invariant so I don't believe that is the issue. Based on this question:https://stackoverflow.com/questions/56672684/error-insufficient-number-of-instruments-when-running-plm-iv-regression, I am supposed to have 2 instruments for every endogenous variable but that isn't possible in my case so I am a bit stuck. The regression runs fine in STATA. I will add the STATA output below. Any advice would be appreciated!

random <- plm(asinh(AECAPITA)~asinh(TOTAL_E) +asinh(TOURISM_5KM_SUM)+ + HHSIZE2 + SEX|Z_b_100 + HHSIZE2 ++ tourism_10KM_ZSCORE+ SEX ,data=df,index=c("UNIQUE_HH_ID"), model="random")

Error in plm.fit(data, model = models[1L], effect = effect) : 
  insufficient number of instruments
In addition: Warning message:
In pdata.frame(data, index) :
  duplicate couples (id-time) in resulting pdata.frame
 to find out which, use, e.g., table(index(your_pdataframe), useNA = "ifany")
. gen log_total_e_10 = log(TOURISM_10KM_SUM)
(8,229 missing values generated)

. do "C:\Users\mcket\AppData\Local\Temp\STD728_000000.tmp"

. . xtivreg AE_CAPITA_US SEX HHSIZE2 ( TOTAL_E TOURISM_10KM_SUM = Z_b_100 tourism_10KM_ZSCORE), re

G2SLS random-effects IV regression              Number of obs     =     11,374
Group variable: UNIQUE_HH_ID                    Number of groups  =      5,293

R-squared:                                      Obs per group:
     Within  = 0.0044                                         min =          1
     Between = 0.0546                                         avg =        2.1
     Overall = 0.0338                                         max =         12

                                                Wald chi2(4)      =    1011.76
corr(u_i, X) = 0 (assumed)                      Prob > chi2       =     0.0000

----------------------------------------------------------------------------------
    AE_CAPITA_US | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-----------------+----------------------------------------------------------------
         TOTAL_E |  -.0088191   .0015754    -5.60   0.000    -.0119068   -.0057314
TOURISM_10KM_SUM |   .0010611   .0003845     2.76   0.006     .0003076    .0018147
             SEX |   247.1207   67.98776     3.63   0.000     113.8672    380.3743
         HHSIZE2 |  -490.9899   15.70356   -31.27   0.000    -521.7683   -460.2115
           _cons |   5172.889   126.4606    40.91   0.000     4925.031    5420.747
-----------------+----------------------------------------------------------------
         sigma_u |          0
         sigma_e |  73042.599
             rho |          0   (fraction of variance due to u_i)
----------------------------------------------------------------------------------
Endogenous: TOTAL_E TOURISM_10KM_SUM
Exogenous:  SEX HHSIZE2 Z_b_100 tourism_10KM_ZSCORE
tappek commented 1 year ago

I see you posted this to StackOverflow already (https://stackoverflow.com/q/76808363/4640346) and there is an answer on Stackoverflow already pointing to the warning message you receive in addition. I can second that this warning is to be addressed first to feed appropriate data into function plm() and it is typically easier to create the `pdata.frame´ first.

The code you gave here diverges from the one on SO in at least two points:

tappek commented 11 months ago

Closing this now due to assuming that treating the panel dimensions and formula correctly will show no error then. Feel free to re-open when a reproducible example can be supplied.