stan-dev / loo

loo R package for approximate leave-one-out cross-validation (LOO-CV) and Pareto smoothed importance sampling (PSIS)
https://mc-stan.org/loo
Other
150 stars 34 forks source link

Averaging predictive distributions #83

Open jgabry opened 6 years ago

jgabry commented 6 years ago

This is related to #82 but should be it’s own issue.

Currently we rely on users to use the model weights to create the appropriate mixture of predictive distributions. We should recommend (and demonstrate in a vignette with rstan and loo, and automate in rstanarm and brms) a method for doing this.

There are various options for doing it. Today @avehtari and I discussed this and we are leaning towards the approach of taking (approx) S*weight_k draws from each posterior predictive distribution, where there are K models and S is the desired sample size. This is better when weights are small than sampling from each posterior predictive distribution with probability weight_k, and easier to implement than a stratified version (maybe an option at some point?).

avehtari commented 6 years ago

For resampling the reference is Kitagawa, G., Monte Carlo Filter and Smoother for Non-Gaussian Nonlinear State Space Models, Journal of Computational and Graphical Statistics, 5(1):1-25, 1996.

Simple random resampling samples indices randomly according to the weights weight_k.

In deterministic resampling indices are sampled using deterministic numbers u_j~(j-.a)/S, for fixed a (usually 0.5) in [0,1) and S is the number of draws. Compare this to simple random resampling where u_j~U[0,n]. Deterministic resampling has small bias and small variance.

In stratified resampling indices are sampled using random numbers u_j~U[(j-1)/S,j/S]. Stratified resampling is unbiased, almost as fast as deterministic resampling, and has only a slightly larger variance.

jgabry commented 6 years ago

We are now thinking that the stratified version is best (and actually not so difficult to implement). For the stratified version we need a function like this to sample the model indices (or posterior draw indices of doing loo predictive checks):

# @param wts Vector of model weights.
# @param n_draws Number of draws to take from the mixture distribution.
# @return Integer vector of model indices. 
sample_indices <- function(wts, n_draws) {
  K <- length(wts)
  w <- n_draws * wts # expected number of draws from each model
  idx <- rep(NA, n_draws)

  c <- 0
  j <- 0

  for (k in 1:K) {
    c <- c + w[k]
    if (c >= 1) {
      a <- floor(c)
      c <- c - a
      idx[j + 1:a] <- k
      j <- j + a
    }
    if (j < n_draws && c >= runif(1)) {
      c <- c - 1
      j <- j + 1
      idx[j] <- k
    }
  }
  return(idx)
}
jgabry commented 6 years ago

Following up after today, For prediction averaging we then have something like

idx <- sample_indices(wts, n_draws = S) # sample_indices defined above

ypred <- list()
for (k in seq_along(wts)) {
  ypred[[k]] <- posterior_predict(fits[[k]], draws = sum(idx==k), ...)
}
ypred <- do.call(rbind, ypred)

Separately, for an improvement over the LOO-PIT diagnostic we can use sample_indices() to select which posterior draws to use and (how many times each) and then we use these resampled posterior draws to draw yrep from the predictive distribution and make the SBC-style histogram. In this case the wts are the weights from PSIS, not the model weights. The number of draws to use is a somewhat open question.

lauken13 commented 5 years ago

Is there much progress with this? I talked with @jgabry about maybe implementing if no one else is working on it?

avehtari commented 5 years ago

I don't think anyone else has done anything for this.

lauken13 commented 5 years ago

Cool! Was a decision made about the number of draws?

avehtari commented 5 years ago

I think we can go with default 4000 for combining predictions and for LOO-PIT if it's ranks then 1023 as in SBC paper

jgabry commented 3 years ago

I just got an email asking about how to do this and it reminded me of this open issue. Until we add a function to automate this, should we just add an example of doing it in the doc? Or @avehtari is there an example somewhere in one of your model comparison case studies / notebooks of doing this that we can point to?

avehtari commented 3 years ago

posterior has now support for weighted draws and importance resampling, so it would be good to use that implementation. Then what need to be added would be a function setting the weights for the predictive draws based on model weights, and then call posterior to do the rest.