paul-buerkner / brms

brms R package for Bayesian generalized multivariate non-linear multilevel models using Stan
https://paul-buerkner.github.io/brms/
GNU General Public License v2.0
1.28k stars 181 forks source link

Understanding predict function #82

Closed alireza202 closed 8 years ago

alireza202 commented 8 years ago

Let's say I'm fitting this model:

brm(formula = y ~ x + (x | item), 
    data = df,
    family = "negbinomial") -> fit_model

I want to have a prediction function as a function of x, say y ~ negbinomial(a + b*x, shape), instead of using the built-in predict function. Can I use the estimated intercept and slope for each item (plus the population-level effects), the estimated shape, and say I have the prediction function? Is it as simple as that, or I need to sample from distributions of a, b and shape, or even more?

paul-buerkner commented 8 years ago

If you want to put is this way, it's really as simple as that. :-)

Do you really want to build such a function? If yes, it is likely that the build-in predict function can already do that for you. Using your example, we can define:

pred_fun <- function(x, model, item = "some default item name", ...) {
  predict(model, newdata = data.frame(x = x, item = item), ...)
}
alireza202 commented 8 years ago

Well, the reason that I want to understand the predict function is that I want to be able to create predictions in another program, and I don't want to go through the hassle of calling R from there. In another words, I want to save as few parameters as possible, and then read them in another language and create prediction for any x I want.

So I'm still not sure how to create predictions from brmsfit. A while ago, I tried creating prediction intervals for simple linear model, and I learned that I need to actually take into account a few more things:

http://stats.stackexchange.com/questions/185222/rs-lm-prediction-interval-vs-simulation/185233

So do you sample from a joint distribution of parameters to do prediction, similar to what I did above?

paul-buerkner commented 8 years ago

If you want to take a look at how I do it in brms, I recommend taking a look at the code, which is mainly in brmsfit-methods.R, extract_draws.R, linear_predictor.R, and predict.R. The actual sampling of the predicted response is in the latter .R file.

Edit: When fitting your model using MCMC methods, all information about your parameters (including correlations etc.) is in the posterior samples. So there is not need to explicitly take into account the correlations or similar stuff, because it's already in your posterior samples

alireza202 commented 8 years ago

Can you shed more light into what sources of uncertainty the predict function takes into account? As far as I understand, in a multilevel model, there are three sources of uncertainty:

  1. the residual (observation-level) variance
  2. the uncertainty in the fixed coefficients
  3. the uncertainty in the variance parameters for the grouping factors

Are all these uncertainties taken into account?

paul-buerkner commented 8 years ago

To include all sources of uncertainty call predict(.). To exclude 1. call fitted(.). To exclude 3. (possible partially if you want), use the re_formula argument.

alireza202 commented 8 years ago

I had a comprehensive discussion with Ben Bolker at lme4 page (https://github.com/lme4/lme4/issues/388) on the possibilities of getting prediction intervals for mixed models. Apparently it's not very straight-forward. It's great that you can offer this capability in brms.

jgabry commented 8 years ago

On Sunday, July 10, 2016, Ali Roshan Ghias notifications@github.com wrote:

I had a comprehensive discussion with Ben Bolker at lme4 page ( lme4/lme4#388 https://github.com/lme4/lme4/issues/388) on the possibilities of getting prediction intervals for mixed models. Apparently it's not very straight-forward. It's great that you can offer this capability in brms.

This is precisely one of those things that poses all kinds of problems for maximum likelihood based approaches but becomes trivial when you do full Bayes. The only downside is that MCMC is slow, at least compared to maximum marginal likelihood estimation.

alireza202 commented 8 years ago

Exactly! lme4 is orders of magnitude faster, but it doesn't provide prediction intervals as well as Stan does.

alireza202 commented 8 years ago

Paul,

I did look into your code, and this is what I understood. Please let me know if it is correct:

You basically create eta = Xb + Zu for the new data X, use the same Z if there is no new levels, and use the posterior samples of b and u (stored in object$fit@sim), and then you draw 1 sample from each iteration from a rng_dist, say negbiomial, with extra parameters such as shape drawn from the same iteration. Is it important to create eta and shape from the same iteration? Does this mean that we keep the correlation between the variables?

paul-buerkner commented 8 years ago

This is correct. It is very important to use the same iteration. The posterior distribution is a multidimensional distribution capturing dependencies between all parameters. That is, it contains much more information than the marginal distribution of the parameters.

alireza202 commented 8 years ago

Thanks so much. It makes total sense.

jgabry commented 8 years ago

On Tuesday, July 12, 2016, Paul-Christian Bürkner notifications@github.com wrote:

This is correct. It is very important to use the same iteration. The posterior distribution is a multidimensional distribution capturing dependencies between all parameters. That is, it contains much more information than the marginal distribution of the parameters.

Well said. This is an important point. Another way to put it, using R names, is that hist(x) is the same regardless of how you permute x, but plot(x,y) is only the same after reordering if you use exactly the same permutation for x and y.