paul-buerkner / brms

brms R package for Bayesian generalized multivariate non-linear multilevel models using Stan
https://paul-buerkner.github.io/brms/
GNU General Public License v2.0
1.28k stars 184 forks source link

Feature request: Multiple terms per autocorrelation class in formula ? #1510

Open osupplisson opened 1 year ago

osupplisson commented 1 year ago

Hi, first and foremost, thank you very much for your work on this amazing package! I hope the following enhancement request will seem relevant.

I was trying to include two CAR (BYM2) terms and the following errors showed up (brms version 2.19.6, R version 4.3.0, Windows), which is pretty self-explanatory: Error: Can only model one term per autocorrelation class.

Is there any theoretical reason for this restriction? I had a look at R-INLA and multiple structured varying effects are supported, so this might come from something else. Would this be a possible enhancement for a near-future version?

As a bit of context, I was trying to use multiple structured varying terms with the aim to perform an MRP analysis, following this paper: Yuxiang Gao. Lauren Kennedy. Daniel Simpson. Andrew Gelman. "Improving Multilevel Regression and Poststratification with Structured Priors." Bayesian Anal. 16 (3) 719 - 744, September 2021. https://doi.org/10.1214/20-BA1223.

My goal was to include (in addition to other terms) two CAR-BYM2: one accounting for the spatial structure of the data and another one accounting for a hypothetical structure for age classes (age classes a-1 and a being 'adjacent'). On top of that, I wanted to use this with a non-linear formula aiming to account for misclassification errors (INLA does not support that, as far as I am aware).

Best, Olivier

Ps: The error seems to be triggered by the following lines of code in the formula-ac function:


# covariance matrices of natural residuals will be handled
  # directly in the likelihood function while latent residuals will
  # be added to the linear predictor of the main parameter 'mu'
  out$nat_cov <- out$[cov](https://rdrr.io/r/stats/cor.html) & [has_natural_residuals](https://rdrr.io/cran/brms/src/R/families.R)(x)
  [class](https://rdrr.io/r/base/class.html)(out) <- [acef_class](https://rdrr.io/cran/brms/src/R/formula-ac.R)()
  # validate specified autocor terms
  if ([any](https://rdrr.io/r/base/any.html)([duplicated](https://rdrr.io/r/base/duplicated.html)(out$[class](https://rdrr.io/r/base/class.html)))) {
    [stop2](https://rdrr.io/cran/brms/src/R/misc.R)("Can only model one term per autocorrelation class.")
  }
  if ([NROW](https://rdrr.io/r/base/nrow.html)([subset2](https://rdrr.io/cran/brms/src/R/misc.R)(out, [dim](https://rdrr.io/r/base/dim.html) = "time")) > 1) {
    [stop2](https://rdrr.io/cran/brms/src/R/misc.R)("Can only model one time-series term.")
  }
  if ([NROW](https://rdrr.io/r/base/nrow.html)([subset2](https://rdrr.io/cran/brms/src/R/misc.R)(out, [dim](https://rdrr.io/r/base/dim.html) = "space")) > 1) {
    [stop2](https://rdrr.io/cran/brms/src/R/misc.R)("Can only model one spatial term.")
  }
  if ([NROW](https://rdrr.io/r/base/nrow.html)([subset2](https://rdrr.io/cran/brms/src/R/misc.R)(out, nat_cov = [TRUE](https://rdrr.io/r/base/logical.html))) > 1) {
    [stop2](https://rdrr.io/cran/brms/src/R/misc.R)("Can only model one covariance matrix of natural residuals.")
  }
  if ([use_ac_cov](https://rdrr.io/cran/brms/src/R/formula-ac.R)(out) || [has_ac_class](https://rdrr.io/cran/brms/src/R/formula-ac.R)(out, "arma")) {
    if ([any](https://rdrr.io/r/base/any.html)(!out$dpar %in% [c](https://rdrr.io/r/base/c.html)("", "mu") | [nzchar](https://rdrr.io/r/base/nchar.html)(out$nlpar))) {
      [stop2](https://rdrr.io/cran/brms/src/R/misc.R)("Explicit covariance terms can only be specified on 'mu'.")
    }
  }
  out
}

You will find below a code replicating the error:

#Dataset
library(SpatialEpi)
#Working with spatial object
library(spdep)
#Fit
library(brms)
#Data wrangling
library(tidyverse)

#Loading ata
data(scotland)

#From poly to list of neighbors
nb_list <- poly2nb(scotland$spatial.polygon)

# Transforming the list into an adjency matrix
W <- nb2mat(nb_list,
            style = "B",
            zero.policy=TRUE)

#Naming the rows for gr argument
rownames(W) <- names(scotland$spatial.polygon)

#Fit - This is working
fit <- brm(cases ~ offset(log(expected)) + car(W, type = "bym2", gr = county.names),
  data = scotland$data,
  data2 = list(
    W = W
  ),
  family = poisson(link = "log"),
  chains = 1
)

#Generating a fake second adjency matrix
W_fake <- W
W_fake[W_fake==1] <- rbinom(n = length(W_fake[W_fake==1]),
                            size = 1,
                            prob = 0.5)

#Fit - This is NOT working
fit <- brm(cases ~ offset(log(expected)) +
             car(W, type = "bym2", gr = county.names) +
             car(W_fake, type = "bym2", gr = county.names),
           data = scotland$data,
           data2 = list(
             W = W,
             W_fake  = W_fake
           ),
           family = poisson(link = "log"),
           chains = 1
)
paul-buerkner commented 1 year ago

Good question. In your case (multiple CAR terms) there is no theoretical reason for this. It is merely implementation at this point. In future versions, it may be possible to add multiple such terms but that requires some backwards compatibility breaking changes in parameter naming so it may have to wait until brms 3.0