sambrilleman / rstanarm

rstanarm R package for Bayesian applied regression modeling
http://mc-stan.org/interfaces/rstanarm.html
GNU General Public License v3.0
0 stars 1 forks source link

Various errors depending on the way dataLong and/or formulaLong are provided #80

Open jburos opened 7 years ago

jburos commented 7 years ago

Summary:

Getting the following error when fitting a model with a single longitudinal biomarker (M = 1) and where the biomarker details (formulaLong, family, etc) are provided as a list.

Error in fm[[m + 1]] : subscript out of bounds

For context, I am creating the inputs to the Stan function programmatically (to support the simulation testing) and so it's easier to create all inputs as lists irrespective of the length. In that case, I encountered the error on this line of run_simtest():

 purrr::lift_dl(rstanarm::stan_jm, seed = stan_seed)(sim$func_inputs)

Two details are relevant to the issue:

  1. within sim$func_inputs, formulaLong is provided as a list (here of length 1).
  2. we are executing a modified version of stan_jm, which transforms the result of match.call().

I should also note that, in the course of creating a reproducible example, I found 3 different scenarios in which similar errors were produced. Figure it's worth documenting them here since all three are likely related to the same code-block.

Description:

The error occurs on this section of stan_jm.R:

m_mc <- lapply(1:M, function(m, old_call, env) {
    new_call <- old_call
    fm     <- old_call$formula
    data   <- old_call$data
    family <- old_call$family
    new_call$formula <- if (is(eval(fm,     env), "list")) fm[[m+1]]     else fm
    new_call$data    <- if (is(eval(data,   env), "list")) data[[m+1]]   else data
    new_call$family  <- if (is(eval(family, env), "list")) family[[m+1]] else family
    new_call
  }, old_call = y_mc, env = calling_env)

Although it's worth pointing out that the results of validate_arg (in which broadcast = TRUE) are not used in this section. Perhaps they should be?

Reproducible Steps:

Example of a standard model spec (works):

data(pbcSurv)
data(pbcLong)
f1 <- stan_jm(formulaLong = logBili ~ year + (1 | id),
              dataLong = pbcLong,
              formulaEvent = Surv(futimeYears, death) ~ sex + trt,
              dataEvent = pbcSurv,
              time_var = "year")

Now, fit a model where the formulaLong, dataLong, or family input is a list:

f2 <- stan_jm(formulaLong = list(logBili ~ year + (1 | id)),
              dataLong = pbcLong,
              family = gaussian,
              formulaEvent = Surv(futimeYears, death) ~ sex + trt,
              dataEvent = pbcSurv,
              time_var = "year")

The above also works.

However, if you do something like the following:

my_long_formula <- list(logBili ~ year + (1 | id))
f3 <- stan_jm(formulaLong = my_long_formula,
              dataLong = list(pbcLong),
              family = gaussian,
              formulaEvent = Surv(futimeYears, death) ~ sex + trt,
              dataEvent = pbcSurv,
              time_var = "year")

You get:

Error in fm[[m + 1]] : object of type 'symbol' is not subsettable

This is not the same error as above, but it is related.

However, if you do the following:

my_long_formula <- logBili ~ year + (1 | id)
f4 <- stan_jm(formulaLong = list(my_long_formula),
              dataLong = pbcLong,
              formulaEvent = Surv(futimeYears, death) ~ sex + trt,
              dataEvent = pbcSurv,
              time_var = "year")

there is no error, since formulaLong object is not a symbol.

The problem here appears to be a function of the calling env vs the env in which the functions are defined. Here we are getting close to the reproducible example, but not exactly there.

prep_inputs <- function() {
    list(dataLong = pbcLong,
          formulaLong = list(logBili ~ year + (1 | id))
          )
}
inputs <- prep_inputs()
f5 <- stan_jm(
              formulaLong = inputs$formulaLong,
              dataLong = inputs$dataLong,
              formulaEvent = Surv(futimeYears, death) ~ sex + trt,
              dataEvent = pbcSurv,
              time_var = "year")

This yields an error:

Error: !checkLHS || length(as.formula(formula, env = denv)) == 3 is not TRUE

In order to reproduce the original error message, we have to modify the above slightly to use purrr::lift_dl.

Specifically:

prep_inputs <- function() {
    list(dataLong = pbcLong,
          formulaLong = list(logBili ~ year + (1 | id))
          )
}
inputs <- prep_inputs()
f6 <- purrr::lift_dl(stan_jm,
              formulaEvent = Surv(futimeYears, death) ~ sex + trt,
              dataEvent = pbcSurv,
              time_var = "year")(inputs)

This yields the above:

Error in fm[[m + 1]] : subscript out of bounds

This leads to 3 specific & different bugs:

  1. subscript out of bounds
  2. Error: !checkLHS is not TRUE
  3. object of type 'symbol' is not subsettable

Starting to debug

When debugging the subscript out of bounds error, the value of fm object is different than what is expected.

In the typical case (f1 in the examples above), the object fm looks like the following:

Browse[4]> str(fm)
 language list(logBili ~ year + (1 | id))
Browse[4]> m
[1] 1
Browse[4]> fm[[m + 1]]
logBili ~ year + (1 | id)
Browse[4]> fm[[m]]
list
Browse[4]> is(fm, 'list')
[1] FALSE

Whereas in the bug-producing case (analogous to f6, above, however this was debugged using the simtest_jm code that originally produced the error) we have the following:

Browse[4]> str(fm)
List of 1
 $ :Class 'formula'  language Yij_1 ~ Z1 + Z2 + tij + (tij | id)
  .. ..- attr(*, ".Environment")=<environment: 0x9a42eb0>
Browse[4]> m
[1] 1
Browse[4]>  is(fm, 'list')
[1] TRUE

In this latter case, it's clear why the call to fm[[m+1]] yields an error.

RStanARM Version:

Most recent commit 8617674 of sambrilleman/rstanarm, master branch

R Version:

Observed on both 3.3.3 & 3.4.0

Operating System:

Linux / ubuntu

jburos commented 7 years ago

In the case of f3 above (the object of type 'symbol' is not subsettable bug), fm is of a different structure:

Browse[2]> str(mc)
 language stan_jm(formulaLong = my_long_formula,