paul-buerkner / brms

brms R package for Bayesian generalized multivariate non-linear multilevel models using Stan
https://paul-buerkner.github.io/brms/
GNU General Public License v2.0
1.28k stars 182 forks source link

No initialization with censored model #527

Closed MCMaurer closed 6 years ago

MCMaurer commented 6 years ago

Hi there,

I've been fitting a multilevel Poisson model and it's been fitting fine, after tuning some parameters to get rid of divergent transitions. Here's the basic Poisson model:

model_1 <- brm(data = d, family = poisson, 
               latency ~ 1 + treatment + trial +
                 (1 + treatment + trial | group_ID) +
                 (1 + treatment + trial | tank),
               prior = c(set_prior("normal(0, 1)", class = "Intercept"),
                         set_prior("normal(0, 1)", class = "b"),
                         set_prior("cauchy(0, 2)", class = "sd"),
                         set_prior("lkj(4)", class = "cor")),
               iter = 5000, warmup = 1000, chains = 3, cores = future::availableCores(), 
           control = list(adapt_delta = 0.98, max_treedepth = 15), 
               save_model = "fit_models/stan_model_1")

However, I realized some of the data are censored and I could be accounting for that. I added a column to the data to indicate uncensored (0) or right-censored (1), and changed the model slightly:

model_1_cens <- brm(data = d, family = poisson, 
               latency | cens(is_censored) ~ 1 + treatment + trial +
                 (1 + treatment + trial | group_ID) +
                 (1 + treatment + trial | tank)

When I try to fit this model, I get a bunch of repetitions of this error:

Rejecting initial value:
  Log probability evaluates to log(0), i.e. negative infinity.
  Stan can't start sampling from this initial value.

After reading up on this error, I tried both shrinking and expanding the range of initialization values, and neither one worked, I still got the same error. Initialization values seemed to be the only posited solution in the Stan help venues, so I'm not quite sure what to do from here. Any thoughts would be much appreciated!

Session info:

R version 3.4.4 (2018-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.5 LTS

Matrix products: default
BLAS: /usr/lib/atlas-base/atlas/libblas.so.3.0
LAPACK: /usr/lib/atlas-base/atlas/liblapack.so.3.0
paul-buerkner commented 6 years ago

Not sure what is happening. Can you provide a minimal reproducible example?

MCMaurer commented 6 years ago

@paul-buerkner sure thing, this example seems to do it:


set.seed(1)
data <- data.frame(
  response = c(rpois(80,10), rep(300,20)),
  group = rep(1:5, 20),
  treatment = rep(c("a", "b", "b", "c", "c"), 20)
)

data <- data %>% 
  mutate(
    is_censored = case_when(response == 300 ~ 1,
                            TRUE ~ 0)
  )

# fit model
model_1 <- brm(data = data, family = poisson, 
               response | cens(is_censored) ~ 1 + treatment +
                 (1 + treatment | group),
               prior = c(set_prior("normal(0, 1)", class = "Intercept"),
                         set_prior("normal(0, 1)", class = "b"),
                         set_prior("cauchy(0, 2)", class = "sd"),
                         set_prior("lkj(4)", class = "cor")),
               iter = 5000, warmup = 1000, chains = 1, cores = future::availableCores(),
               control = list(adapt_delta = 0.98, max_treedepth = 15))
paul-buerkner commented 6 years ago

So the sampling doesn't fail for me. Those few initial errors are not a problem usually, if, after that, the sampler starts running normally.

MCMaurer commented 6 years ago

With the full model, whenever I get the model object back, it says the sampling has failed completely and the model object does not contain samples... Perhaps this minimal example doesn't fully reproduce the problem?

MCMaurer commented 6 years ago

When I run the full model, I get a long string of the error I mentioned before, followed by this:

Initialization between (-1, 1) failed after 100 attempts.
 Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model.
[1] "Error in sampler$call_sampler(args_list[[i]]) : Initialization failed."
[2] "In addition: Warning message:"
[3] "Rows containing NAs were excluded from the model. "
error occurred during calling the sampler; sampling not done
here are whatever error messages were returned
[[1]]
Stan model 'poisson brms-model' does not contain samples.

[[2]]
Stan model 'poisson brms-model' does not contain samples.

[[3]]
Stan model 'poisson brms-model' does not contain samples.
paul-buerkner commented 6 years ago

Ok I see. The problem is because -- using your reprex -- p(y > 300) is so small. we effectively evaluate log(p(y > 300)) = log(0) = - infinity. So, perhaps, the non-censored data and the censored data doesn't really come from the same parameters hence creating problems in the model fitting. In other words, you may have a strong data - model misfit.

MCMaurer commented 6 years ago

Ahhhhhh I see... these data are latency times for a fish to move, but the trial stops after 5 mins (300 seconds). Most fish move pretty quickly, but some never end up moving, so their latency is logged as 300 (which is really 300 or greater).

All that I really changed when going from the poisson to censored poisson was specifying that any value of 300 is really a value of 300 or more, but that would be enough to create a strong misfit?

paul-buerkner commented 6 years ago

That would be enough. I think you are dealing with a mixture model. One that decides on whether I move or not (at all) and the second is poisson. A discussion about such a model should preferably moved to https://discourse.mc-stan.org/ for greater visibility.

MCMaurer commented 6 years ago

@paul-buerkner oh man, I hadn't even thought of a mixture model, but that makes so much more sense! I sincerely appreciate the help.