paul-buerkner / brms

brms R package for Bayesian generalized multivariate non-linear multilevel models using Stan
https://paul-buerkner.github.io/brms/
GNU General Public License v2.0
1.27k stars 182 forks source link

All MCMC samples are the same #180

Closed alireza202 closed 7 years ago

alireza202 commented 7 years ago

I've ran into a weird issue, and I hope you can help me with it. Here is the model I'm working with:

the_formula = brmsformula(y ~ offset(z) + x + (x | item), 
                          shape ~ (1 | item))

and the prior:

fixed_prior = set_prior('normal(0, 10)', class = 'b', ub = 0)

When I do:

brm(formula = the_formula, 
    data = training,
    family = negbinomial(),
    prior = fixed_prior,
    chains = 1) -> fit_model

and I look at a sample variable like fit_model$fit@sim$samples[[1]][['r_item[item_35,Intercept]']], they are all the same (except the first 20-25 samples from the warmup period)! In fact, all parameters seem to show the same behavior. But when I remove the population-level shape out of the formula:

the_formula = y ~ offset(z) + x + (x | item)

the results look normal. I did this in rstan and cmdstan, and got exactly the same results. Any idea what can go wrong that can result in having exactly the same markov chain samples? Can it be somehow related to the way population-level shape is coded?

I'll send you the sample dataset by email that you can check for yourself. Just a heads-up, the model with shape takes forever to complete (around 1 hour), whereas the one without shape completes in a few minutes.

Also when I use meanfield algorithm, it seems like there is no issue in the model with the shape.

An unrelated question: Do you know why rstan is so much slower than cmdstan? The same model with shape ran for only 12 minutes on cmdstan. I'm just comparing the sampling parts, and not the compiling parts.

paul-buerkner commented 7 years ago

I didn't run the model myself yet, but I am pretty sure the chains get stuck somewhere in a pathodological region of the posterior.

Did you get any divergent transitions? If so, increasing adapt_delta via control = list(adapt_delta = <value above 0.8 e.g. 0.95>) may help.

I wouldn't trust the meanfield results too much to be honest. Variational inference algorithms have their problems with multilevel models.

You may also want to specify informative priors on the intercept of the shape parameter, but I will try this out myself first.

Hmm the difference between CmdStan and RStan is strange. Which versions of both programs are you using? Possibly your Rstan version is outdated.

Edit: For what reason are you restricting the population-level effects to values below zero?

alireza202 commented 7 years ago

I got a bunch of

Informational Message: The current Metropolis proposal is about to be rejected because of the following issue: Exception thrown at line 75: neg_binomial_2_log_log: Precision parameter[1] is 0, but must be > 0! If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine, but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.

I'm going to run it with higher adapt_delta, though using cmdstan since it's much faster.

Rstan version is 2.14.1, and cmdstan is the latest from Github.

paul-buerkner commented 7 years ago

Maybe it is just luck that CmdStan runs some chains that happen to be fast, since the variation in speed across chains will often be large in Stan.

@jgabry Any other idea why cmdstan could potentially be faster?

alireza202 commented 7 years ago

I tried with adapt_delta = 0.95, and it did not help. The reason can be that there are a lot of zeros in the response variable, y. Maybe even negative binomial is not a good distribution for these kinds of scenarios. But removing population-level shape definitely helps.

Do you know why the Precision parameter[1], which I think is the shape, can become zero?

paul-buerkner commented 7 years ago

Yes, the precision parameter is the shape. You definitely need priors on the intercept of the shape parameter (and possibly also on the group-level SDs). Try going for

fixed_prior = c(
  set_prior('normal(0, 10)', class = 'b', ub = 0),
  set_prior('normal(0, 3)', class = 'b', coef = 'Intercept', nlpar = 'shape'),
  set_prior('normal(0, 5)', class = 'sd', nlpar = 'shape')
)

When having many zeros in the response, you may want to try a zero-inflated model, for instance families zero_inflated_poisson or zero_inflated_negbinomial.

paul-buerkner commented 7 years ago

Ok, this doesn't work well either. I guess the model with the shape parameter varying over items may be to complicated for the data at hand.

alireza202 commented 7 years ago

Yes, I agree.

jgabry commented 7 years ago

I don't think CmdStan should be noticeably faster than RStan except in some cases where speed can decrease due to memory-related issues (sometimes going through R does slow things down).

Are you using the same seed in RStan and CmdStan? Actually, even the same seed might not be enough for comparison (see http://discourse.mc-stan.org/t/different-outputs-in-rstan-vs-pystan/211/15). And if things aren't exactly the same, then I wouldn't trust a timing comparison unless you have convergence. If things are going wrong then comparing the time doesn't tell you much because the chains can go to different places. Even if the model is fitting pretty well you may still need to run it many times in order to get a good estimate of timing. It can vary a lot for some models.

alireza202 commented 7 years ago

Well, as far as this model and this specific dataset go, I have ran them many times, and each time cmdStan was considerably faster. But to your point, the models were not converging.

paul-buerkner commented 7 years ago

@alireza202 Anything more to discuss here, or can I close this issue?

alireza202 commented 7 years ago

Sure, go ahead.