sambrilleman / rstanarm

rstanarm R package for Bayesian applied regression modeling
http://mc-stan.org/interfaces/rstanarm.html
GNU General Public License v3.0
0 stars 1 forks source link

Error in posterior_survfit: initial value in 'vmmin' is not finite #44

Closed jburos closed 7 years ago

jburos commented 7 years ago

Summary:

Seeing the following issue when running posterior_survfit with newdataLong and newdataEvent:

Error in optim(inits, optim_fn, object = object, data = dat_i, pars = pars_means,  :                                                    │      newtimes[newtimes < 0] <- 0.0  # use baseline where lagged t is before baseline
  initial value in 'vmmin' is not finite

Description:

This error appears to only occur when running posterior_survfit on newdata, when either ll_event or ll_b evaluate to NaN or -Inf.

I have observed this error in two scenarios, which I will attempt to describe here.

In one case, I noticed that on line 568 of log_lik.R, within ll_jm function code, the value ll_b was Inf.

This section reads as follows:

    ll_b   <- mvtnorm::dmvnorm(b, mean = mu, sigma = Sigma, log = TRUE)  ## <- line 568

I have not yet been able to reproduce this scenario with a public dataset.

In a separate example, reproduced below, ll_event evaluated to NaN when qtimes was a vector of 0s. This appears to happen when using a weibull basehaz to estimate posterior survival probabilities where the last given biomarker time is at time 0, since this is used as a last evaluation time for each id. (see "reproducible steps" below for the logic behind this assertion).

Within the call to ll_event, the introduction of NaN values occurs on the following line:

  # baseline hazard
  if (basehaz$type_name == "weibull") { # pars$bhcoef == weibull shape
    log_basehaz <- as.vector(log(pars$bhcoef)) + 
      linear_predictor(pars$bhcoef - 1, log(qtimes))
  }

Reproducible Steps:

(similar to those in #43):

  1. Install fork with minor bugfixes, which builds off develop2 branch of sambrilleman/rstanarm

    devtools::install_github('jburos/rstanarm', ref = 'fix-posterior-predict-newdata', args = '--preclean', local = TRUE)
  2. Fit model using simulated data

    data(pbcSurv)
    data(pbcLong)
    f1 <- stan_jm(formulaLong = logBili ~ year + (1 | id), 
                              dataLong = pbcLong,
                              formulaEvent = Surv(futimeYears, death) ~ sex + trt, 
                              dataEvent = pbcSurv,
                              time_var = "year")
  3. run posterior_survfit on newdata

    library(dplyr)
    subset <- pbcSurv %>% 
       dplyr::distinct(id) %>% 
       dplyr::sample_n(10) %>%
       dplyr::bind_rows(data.frame(id = c(18))) %>%  ## force pt 18 to be part of subset
       dplyr::distinct()
    newdataEvent <- pbcSurv %>% 
    dplyr::semi_join(subset) %>%
    dplyr::select(id, futimeYears, death, sex, trt)
    newdataLong <- pbcLong %>% 
    dplyr::semi_join(subset) %>%
    dplyr::select(id, logBili, year)
    pp_survfit <- posterior_survfit(f1, newdataLong = newdataLong, newdataEvent = newdataEvent)

This yields the following (which may depend on the sample of ids given; here it was observed with id == 18 of pbcSurv):

Error in optim(inits, optim_fn, object = object, data = dat_i, pars = pars_means,  :                                                    │      newtimes[newtimes < 0] <- 0.0  # use baseline where lagged t is before baseline
  initial value in 'vmmin' is not finite

Running ll_event in debug mode for patient 18:

where etimes is 0 and qtimes are 0:

Browse[4]> etimes
[1] 0
Browse[4]> qtimes
 [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

the error occurs on line 499 of log_lik.R, which reads:

  # baseline hazard
  if (basehaz$type_name == "weibull") { # pars$bhcoef == weibull shape
    log_basehaz <- as.vector(log(pars$bhcoef)) + 
      linear_predictor(pars$bhcoef - 1, log(qtimes))
  }

obviously log(qtimes) cannot be evaluated.

In this example, if you attempt to evaluate posterior_survfit excluding patient 18, the result works without error.

pp_survfit1 <- posterior_survfit(f1,
                  newdataLong = newdataLong %>% dplyr::filter(id != 18),
                  newdataEvent = newdataEvent %>% dplyr::filter(id != 18))
plot(pp_survfit1)
## yields nice-looking survival plots

Patient 18 is a patient whose only longitudinal measurement was taken at t == 0.

> newdataLong %>% dplyr::filter(id == 18)
  id  logBili year
1 18 2.433613    0

We can induce the same error in this subset by limiting our subset to use only longitudinal measures taken at baseline (time == 0):

(here, removing id 18 to show that it's sufficient to reproduce the error reported above)

> pp_survfit <- posterior_survfit(f1,
                                  newdataLong = newdataLong %>% dplyr::filter(id != 18 & year == 0),
                                  newdataEvent = newdataEvent %>% dplyr::semi_join(newdataLong %>% dplyr::filter(id != 18 & year == 0), by='id')
                                  )
Error in optim(inits, optim_fn, object = object, data = dat_i, pars = pars_means,  : 
  initial value in 'vmmin' is not finite

RStanARM Version:

See above

sambrilleman commented 7 years ago

Hopefully pull request #49 will have resolved the issue for ID 18 where the last known survival time was time zero.