jepusto / clubSandwich

Cluster-robust (sandwich) variance estimators with small-sample corrections
http://jepusto.github.io/clubSandwich/
46 stars 8 forks source link

Possible bug in multi-level multi-variate model with lme() #46

Open jepusto opened 4 years ago

jepusto commented 4 years ago
library(RCurl)
library(nlme)
library(clubSandwich)

### get data
tmp <- getURL("https://raw.githubusercontent.com/wviechtb/multivariate_multilevel_models/master/data.dat")
dat <- read.table(text=tmp, header=TRUE, sep="\t")

### calculate PA and NA
dat$pa <- rowMeans(dat[, grepl("pa", names(dat))])
dat$na <- rowMeans(dat[, grepl("na", names(dat))])

### keep only variables that are needed
dat <- dat[, c("id", "sex", "beep", "pa", "na")]

### change into very long format
dat <- reshape(dat, direction="long", varying=c("pa","na"), v.names="y", idvar="obs", timevar="outcome")
dat$obs <- NULL
dat <- dat[order(dat$id, dat$beep, dat$outcome),]
rownames(dat) <- 1:nrow(dat)
dat$outnum <- dat$outcome
dat$outcome <- factor(dat$outcome)

### fit multivariate multilevel model
res1 <- lme(y ~ outcome - 1,
            random = ~ outcome - 1 | id,
            weights = varIdent(form = ~ 1 | outcome),
            correlation = corSymm(form = ~ outnum | id/beep),
            data = dat, na.action = na.omit)
summary(res1)

# Error in tcrossprod(s) * R : non-conformable arrays
coef_test(res1, vcov = "CR2")
coef_test(res1, vcov = "CR2", cluster = dat$id)

# the error stems from vcovCR()
vcovCR(res1, type = "CR2", cluster = dat$id)
vcovCR(res1, type = "CR0", cluster = dat$id)

The error goes away when I remove the correlation structure on the errors:

res2 <- lme(y ~ outcome - 1,
            random = ~ outcome - 1 | id,
            weights = varIdent(form = ~ 1 | outcome),
            data = dat, na.action = na.omit)
summary(res2)
coef_test(res2, vcov = "CR2")

But gls() with such a correlation structure works:

res3 <- gls(y ~ outcome - 1,
           weights = varIdent(form = ~ 1 | outcome),
           correlation = corSymm(form = ~ outnum | id/beep),
           data = dat, na.action = na.omit)
summary(res3)
coef_test(res3, vcov = "CR2")
coef_test(res3, vcov = "CR2", cluster = dat$id)

Maybe it's because 'res1' uses 'id' as the grouping variable for 'random' while the correlation structure is defined with respect to 'id/beep'?

jepusto commented 4 years ago

@wviechtb