sambrilleman / rstanarm

rstanarm R package for Bayesian applied regression modeling
http://mc-stan.org/interfaces/rstanarm.html
GNU General Public License v3.0
0 stars 1 forks source link

Error in posterior_survfit: Bug found: elements of 'll_long' should be same class and dimension as 'll_event'. #46

Closed jburos closed 7 years ago

jburos commented 7 years ago

Summary:

I see the following message when running posterior_survfit with newdataLong & newdataEvent. Note that in my original input data, the id_var was of type factor.

Error in ll_jm(object, data, pars, include_b = TRUE) : 
  Bug found: elements of 'll_long' should be same class and dimension as 'll_event'. 

Description:

If I recall correctly, this error occurs in an inner loop of a function that iterates over the id vars in the data, where one of the two elements (of ll_long or ll_event) returned an object of length equal to the number of levels of the factor, whereas the other returned an object of length 1.

Here is the traceback, for the example above (run on non-public data):

 Error in ll_jm(object, data, pars, include_b = TRUE) : 
  Bug found: elements of 'll_long' should be same class and dimension as 'll_event'. 
7.
stop("Bug found: elements of 'll_long' should be same class and ", 
    "dimension as 'll_event'.") at log_lik.R#574
6.
ll_jm(object, data, pars, include_b = TRUE) at posterior_survfit.R#682
5.
fn(par, ...) 
4.
(function (par) 
fn(par, ...))(c(0, 0, 0)) 
3.
optim(inits, optim_fn, object = object, data = dat_i, pars = pars_means, 
    method = "BFGS", hessian = TRUE) at posterior_survfit.R#409
2.
rstanarm::posterior_survfit(ipass.jm0, newdataLong = ifum$tumor %>% 
    dplyr::filter(!is.na(sld)) %>% dplyr::select(usubjid, cat, 
    months, sld) %>% dplyr::arrange(usubjid, months), newdataEvent = ifum$surv %>% 
    dplyr::select(usubjid, whostat, survmnth, censor) %>% dplyr::semi_join(ifum$tumor %>%  ... 
1.
gen_post_survfit1() 

Reproducible Steps:

library(dplyr)
data(pbcSurv)
data(pbcLong)

## fit stan_jm to full data
f1 <- stan_jm(formulaLong = logBili ~ year + (1 | id), 
              dataLong = pbcLong,
              formulaEvent = Surv(futimeYears, death) ~ sex + trt, 
              dataEvent = pbcSurv,
              time_var = "year")

ps_check(f1)

## generate subset from full data
subset <- pbcSurv %>% 
  dplyr::filter(id != 18) %>%
  dplyr::distinct(id) %>% 
  dplyr::sample_n(10)

newdataEvent <- pbcSurv %>% 
  dplyr::semi_join(subset) %>%
  dplyr::select(id, futimeYears, death, sex, trt) %>%
  dplyr::mutate(id = paste0(id, '-new'),
                id = as.factor(id))

newdataLong <- pbcLong %>% 
  dplyr::semi_join(subset) %>%
  dplyr::select(id, logBili, year) %>%
  dplyr::mutate(id = paste0(id, '-new'),
                id = as.factor(id))

## generate OOS predictions, using factor-based id (fails with above error)
pp_survfit <- posterior_survfit(f1, 
                                newdataLong = newdataLong,
                                newdataEvent = newdataEvent
                                )

Note if I instead use chr versions of the ids, this section works without an issue.

## generate OOS predictions, using chr-based id
newdataEvent_chr <- newdataEvent %>%
  dplyr::mutate(id = as.character(id))

newdataLong_chr <- newdataLong %>%
  dplyr::mutate(id = as.character(id))

## works
pp_survfit2 <- posterior_survfit(f1, 
                                newdataLong = newdataLong_chr,
                                newdataEvent = newdataEvent_chr
                                )

RStanARM Version:

fork off of develop2 branch in this repo. Specifically:

devtools::install_github('jburos/rstanarm', ref = 'fix-posterior-predict-newdata', args = '--preclean', local = TRUE)
sambrilleman commented 7 years ago

The factor list (flist) returned by jm_data should now be correct, since factor() is called again on the id variable after subsetting ids etc. The commit below should hopefully deal with this issue.