ycroissant / plm

Panel Data Econometrics with R
GNU General Public License v2.0
49 stars 13 forks source link

vcovHC throws "Error in tind[[i]] : subscript out of bounds" only when model is fitted by first-differences #58

Closed YinSun0 closed 2 months ago

YinSun0 commented 2 months ago

I am pretty used to using R's plm package to fit models with panel data, including first difference models. Today, however, I found myself facing a strange error that is thrown when passing the results of a plm model fitted with first-differences to plm's vcovHC() function to estimated robust variance-covariance matrices:

Error in tind[[i]] : subscript out of bounds

The source code of the plm package shows that the tind mentioned in the error is an internal time index. That is the only clue we get.

Because I am exactly trying to understand what leads to this issue, I can't generate simulated data in order to provide a more general minimally reproducible example. But I can share a modified excerpt of my data, and code that, when run, will throw the error.

The data is in file soquestion.csv .

The code:

library(plm)
df <- read.csv("soquestion.csv")
pmodel <- plm(y ~ x1 + x2, data=df, index=c("group", "time"),
                model="fd", effect="individual")
vcovHC(pmodel)

This should throw the error. I have checked many things in the data, like whether the class of the time variable is correct, whether it has NAs, among other things. Nothing noticeable. But it is worth stressing that this only happens with model="fd". If one tries any other options in parameter, like within or random, then the vcovHC command works as intended and without errors.

tappek commented 2 months ago

Thank you for reporting. That will be clearly a bug. First-difference models are somewhat nasty as data is "compressed" for these (first-differenced), requiering the code to accomodate this circumstance. We fixed quite a few errors stemming from this in the past. Will need to look at this more closely, thank you for providing a reproducible example!

As a random first guess, I think it might relate to some groups only having one observation:

pdim(pmodel)
Unbalanced Panel: n = 158, T = 1-51, N = 4981

For reference: also posted at StackOverflow: https://stackoverflow.com/questions/78670581/in-r-plms-vcovhc-throws-the-obscure-error-error-in-tindi-subscript-out

Here is a self-contained reproducible example with only one observation for a group:

library(plm)
data("Grunfeld", package = "plm")
pGrun <- pdata.frame(Grunfeld)
pGrun1 <- pGrun[-c(61:200), ]
pGrun1 <- pGrun1[-c(2:20), ]
pdim(pGrun1)
mod <- plm(inv ~ value + capital, data=pGrun1, model="fd")
vcovHC(mod)
# Error in tind[[i]] : subscript out of bounds
# Called from: vcovG.plm(x, type = type, cluster = cluster, l = 0, inner = inner, 
#   ...)
YinSun0 commented 2 months ago

Awesome, thanks! It does seem that you are correct. Once I drop groups that only have one observation after dropping cases with NAs in the variables used in the model, the error disappears.

Also, while exploring what was going on I think I have found the same as this GitHub issue. While that seems to be a more serious concern, I think both issues are related to the same underlying thing: how plm is handling NAs in the fd models.

tappek commented 2 months ago

27 is an independent topic and not per se a concern, but might be unexpected by some users.

tappek commented 2 months ago

It is fixed now in the development version. Root cause what excactly what I initially guessed: the code in vcovHC (or more precisely internal function vcovG) and vcovBK was not ready to handle the case of groups with only one observation. Such groups need to be dropped altogether as they contain no observations after first-differencing.