Error in posterior_survfit: initial value in 'vmmin' is not finite

Summary:

Seeing the following issue when running posterior_survfit with newdataLong and newdataEvent:

Error in optim(inits, optim_fn, object = object, data = dat_i, pars = pars_means,  :                                                    │      newtimes[newtimes < 0] <- 0.0  # use baseline where lagged t is before baseline
  initial value in 'vmmin' is not finite

Description:

This error appears to only occur when running posterior_survfit on newdata, when either ll_event or ll_b evaluate to NaN or -Inf.

I have observed this error in two scenarios, which I will attempt to describe here.

In one case, I noticed that on line 568 of log_lik.R, within ll_jm function code, the value ll_b was Inf.

This section reads as follows:

    ll_b   <- mvtnorm::dmvnorm(b, mean = mu, sigma = Sigma, log = TRUE)  ## <- line 568

I have not yet been able to reproduce this scenario with a public dataset.

In a separate example, reproduced below, ll_event evaluated to NaN when qtimes was a vector of 0s. This appears to happen when using a weibull basehaz to estimate posterior survival probabilities where the last given biomarker time is at time 0, since this is used as a last evaluation time for each id. (see "reproducible steps" below for the logic behind this assertion).

Within the call to ll_event, the introduction of NaN values occurs on the following line:

  # baseline hazard
  if (basehaz$type_name == "weibull") { # pars$bhcoef == weibull shape
    log_basehaz <- as.vector(log(pars$bhcoef)) + 
      linear_predictor(pars$bhcoef - 1, log(qtimes))
  }

Reproducible Steps:

(similar to those in #43):

Install fork with minor bugfixes, which builds off develop2 branch of sambrilleman/rstanarm

devtools::install_github('jburos/rstanarm', ref = 'fix-posterior-predict-newdata', args = '--preclean', local = TRUE)

Fit model using simulated data

data(pbcSurv)
data(pbcLong)
f1 <- stan_jm(formulaLong = logBili ~ year + (1 | id), 
                          dataLong = pbcLong,
                          formulaEvent = Surv(futimeYears, death) ~ sex + trt, 
                          dataEvent = pbcSurv,
                          time_var = "year")

run posterior_survfit on newdata

library(dplyr)
subset <- pbcSurv %>% 
   dplyr::distinct(id) %>% 
   dplyr::sample_n(10) %>%
   dplyr::bind_rows(data.frame(id = c(18))) %>%  ## force pt 18 to be part of subset
   dplyr::distinct()
newdataEvent <- pbcSurv %>% 
dplyr::semi_join(subset) %>%
dplyr::select(id, futimeYears, death, sex, trt)
newdataLong <- pbcLong %>% 
dplyr::semi_join(subset) %>%
dplyr::select(id, logBili, year)
pp_survfit <- posterior_survfit(f1, newdataLong = newdataLong, newdataEvent = newdataEvent)

This yields the following (which may depend on the sample of ids given; here it was observed with id == 18 of pbcSurv):

Error in optim(inits, optim_fn, object = object, data = dat_i, pars = pars_means,  :                                                    │      newtimes[newtimes < 0] <- 0.0  # use baseline where lagged t is before baseline
  initial value in 'vmmin' is not finite

Running ll_event in debug mode for patient 18:

where etimes is 0 and qtimes are 0:

Browse[4]> etimes
[1] 0
Browse[4]> qtimes
 [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

the error occurs on line 499 of log_lik.R, which reads:

  # baseline hazard
  if (basehaz$type_name == "weibull") { # pars$bhcoef == weibull shape
    log_basehaz <- as.vector(log(pars$bhcoef)) + 
      linear_predictor(pars$bhcoef - 1, log(qtimes))
  }

obviously log(qtimes) cannot be evaluated.

In this example, if you attempt to evaluate posterior_survfit excluding patient 18, the result works without error.

pp_survfit1 <- posterior_survfit(f1,
                  newdataLong = newdataLong %>% dplyr::filter(id != 18),
                  newdataEvent = newdataEvent %>% dplyr::filter(id != 18))
plot(pp_survfit1)
## yields nice-looking survival plots

Patient 18 is a patient whose only longitudinal measurement was taken at t == 0.

> newdataLong %>% dplyr::filter(id == 18)
  id  logBili year
1 18 2.433613    0

We can induce the same error in this subset by limiting our subset to use only longitudinal measures taken at baseline (time == 0):

(here, removing id 18 to show that it's sufficient to reproduce the error reported above)

> pp_survfit <- posterior_survfit(f1,
                                  newdataLong = newdataLong %>% dplyr::filter(id != 18 & year == 0),
                                  newdataEvent = newdataEvent %>% dplyr::semi_join(newdataLong %>% dplyr::filter(id != 18 & year == 0), by='id')
                                  )
Error in optim(inits, optim_fn, object = object, data = dat_i, pars = pars_means,  : 
  initial value in 'vmmin' is not finite

RStanARM Version:

See above

sambrilleman / rstanarm