stan-dev / bayesplot

bayesplot R package for plotting Bayesian models
https://mc-stan.org/bayesplot
GNU General Public License v3.0
432 stars 82 forks source link

pp_check for prior predictive checks #320

Closed rduesing closed 1 month ago

rduesing commented 6 months ago

Hello there!

For educational purposes, I experimented a bit with different model formulations, with and without centering in brms. For that, I formulated priors and wanted to get prior draws with the sample_priors = "only argument and wanted to do graphical prior predictive checks with pp_check(). Then I noticed an unexpected behavior in the graphical output, since the displayed drawn distributions did neither resemble my prior formulations, nor the summary statistics of the brms.fit object. I tried two different data sets and a simulated toy data set (see below, code for data and analyses and complete code in the .txt file), which all showed the same behavior. I am unsure, if I understood something completely wrong or if there is a problem with pp_check, when displaying the priors for this specific models. For comparison, I extracted and plotted the prior draws from the brms.fit model manually and these plots looked as expected.

The brms package makes a difference how to handle the intercept, if you have centered vs uncentered data. If you formulate the model y ~ 0 + Intercept + x1 + x2 , brms allows for non centered data and you can directly specify a prior for the intercept. Otherwise, if you formulate y ~ 1 + x1 + x2 , brms assumes centering and will adapt the Intercept prior to the mean values of the according data.

Therefore, my first model recognized the raw metric for uncentered data:

library(tidyverse)
library(bayesplot)
library(psych)
#remotes::install_github("rxmenezes/rscreenorm")
library(rscreenorm)

set.seed(123)
dat <-  tibble(
         x1 = rnorm(100, 100, 15),
         x2 = rnorm(100, 100, 15),
         y = 3.5 + 0.015*x1 + 0.015*x2 + rnorm(100, 0, 1))

# Formula for not centered variables
bf.0 <- bf(y ~ 0 + Intercept + x1 + x2)

And the appropriate (very vague) priors:

prior.0 <- c(brms::prior(normal(0, 0.5), class = b, coef = x1),
             brms::prior(normal(0, 0.5), class = b, coef = x2),
             brms::prior(normal(3.5, 2.5), class = b, coef = Intercept),
             brms::prior(normal(0, 2.5), class = sigma, lb = 0))

And the model:

mod.0_ppc <- brm(data = dat,
                 family = gaussian(),
                 formula = bf.0,
                 prior = prior.0,
                 sample_prior = "only",
                 warmup = 2000, iter = 5000,
                 cores = 4, chains = 4)

print(mod.0_ppc)

The output summary looks correct and as expected in accordance with the specified priors:

Population-Level Effects: 
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     3.53      2.53    -1.39     8.58 1.00    11447     7169
x1            0.00      0.50    -0.98     0.98 1.00    10943     8120
x2            0.00      0.51    -1.00     0.99 1.00    11808     8330

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     1.99      1.52     0.07     5.61 1.00     5463     3066

But when I display the prior draws with pp_check, it looks totally all over the place:

pp_check(mod.0_ppc, ndraws = 100) 
pp_check(mod.0_ppc, type = "stat")

density_uncentered_pp_check hist_uncentered_pp_check

The mean values for the distributions (intercepts) are placed all over the range of -200 to 200, which should not be with this priors. To better understand what was going on, I randomly drew prior values and plotted the density plot by myself, to see what brms produced:

all_draws <- rbind(chain1, chain2, chain3, chain4) # I extracted all chains separately and removed the 2000 warmup samples

n_priors <- 100 # how many prior draws for plot
n_sample <- 100 # sample size equal to sample size of model
rand_draws <- all_draws |> 
  slice_sample(n = n_priors) # sample random draws of the prior values from the brms model

prior_ex <- matrix(NA, nrow = n_sample, ncol = n_priors) |> as.data.frame() #define results matrix 

# loop, to draw random distributions with a gaussian likelihood and the values from the prior draws
for (i in 1:n_priors){
  prior_ex[1:n_sample ,i] <- rnorm(n = n_sample, 
                                   mean = (rand_draws[i, 1] + rand_draws[i, 2] + rand_draws[i, 3]),
                                   sd =  rand_draws[i, 4]
  )
}
colnames(prior_ex) <- paste("Data", 1:n_priors) # necessary so that we can plot with the pdensity.column function
pdensity.column(prior_ex, add.leg = FALSE) # plot all replicates to see all distributions from the joint priors, from rscreenorm package

To my surprise, the manually produced prior distributions look as expected and in accordance with the specified priors and the summary for the brms.fit object:

density_uncentered_manually

I tried this with 3 datasets, which all looked similar. To countercheck, I also applied the formula for centered variables, which should lead to distorted prior predictions, since brms internally assumes centering and will pick according priors for the Intercept.

bf.1 <- bf(y ~ 1 + x1 + x2)

prior.1 <- c(brms::prior(normal(0, 0.5), class = b, coef = x1),
             brms::prior(normal(0, 0.5), class = b, coef = x2),
             brms::prior(normal(3.5, 2.5), class = Intercept), # now only as class, not coef anymore, but prior is the same
             brms::prior(normal(0, 2.5), class = sigma, lb = 0))

mod.1_ppc <- brm(data = dat,
                 family = gaussian(),
                 formula = bf.1,
                 prior = prior.1,
                 sample_prior = "only",
                 warmup = 2000, iter = 5000,
                 cores = 4, chains = 4)

print(mod.1_ppc)

Now the summary looks bad, as expected:

Population-Level Effects: 
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     3.56     70.87  -134.17   143.06 1.00    11403     8509
x1           -0.00      0.49    -0.96     0.97 1.00    10485     8272
x2            0.00      0.50    -0.98     0.97 1.00    11594     8566

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     2.00      1.53     0.07     5.70 1.00     6656     4238

and also the manually extracted prior draws (so bad is 'good' since we expected this weird result):

density_centered_manually

But now, the pp_check density plots behave quite normal, although the distributions are wide, but at least centered around the intercept, not as above:

density_centered_pp_check

So, please could someone explain this to me, why do the brms.fit results and the manually extracted plots behave like expected and the plots by pp_check are quite weird? Did I understand something completely wrong, how pp_check is intended to work? To me it seems as if internally there is an option to plot "centered" or "un-centered" and that this is switched in the wrong direction.

I appreciate any help from you! Many thanks, Rainer

test_pp_check.txt

Edit: The first density plot from pp_check shows a pattern that distributions with smaller sigma are clustered around the Intercept. This happened only with the toy data set, where the predictors were normally distributed. With real data, where the predictors did not follow a specific distribution, the sampled distributions were all over the place. In my view, this also indicates that the data structure is taken into account, as with the centered brms analysis.

tjmahr commented 6 months ago

I don't think you are making the predictions correctly because you are not multiplying the x1 and x2 values. Or, you might misunderstand the purpose of prior/posterior prediction which is to take you observed data and get the model's predictions for those observations using draws from the prior or posterior. (If the posterior produces predictions that look like real data, the model is at least doing something correctly.)

Let's take a fitted model and make the density plot from scratch without bayesplot:

# install.packages("tidybayes")
preds <- dat |> 
  tidybayes::add_predicted_draws(mod.0_ppc, ndraws = 100)

ggplot(preds) + 
  aes(x = .prediction) + 
  geom_density(aes(group = .draw))

file3d1c4370111a

These are prior predictions for the original observations. The original observations have x1 and x2 as Normal(100, 15). The priors for b1,b2 are Normal(0, .5), so it is reasonable to get these coefficients to shove the prior predicted densities way to -200 or 200.

We can also compute the prior predictions by hand directly from the model with matrix multiplication.


coef <- mod.0_ppc |> 
  as_draws_matrix() |> 
  _[1:100, c("b_Intercept", "b_x1", "b_x2", "sigma")]

m <- model.matrix( ~ x1 + x2, data = dat)
epreds <- coef[, 1:3] %*% t(m)
# this has one row per draw, one column per observation

# Add noise to epreds
preds <- epreds
for (draw_num in seq_len(nrow(coef))) {
  preds[draw_num, ] <- epreds[draw_num, ] + 
    rnorm(ncol(epreds), 0, coef[draw_num, "sigma"])  
}

plot(density(preds[1, ]))

image

Which is also outside of the prior distribution of the intercept.

To get something closer your intercept, we need to zero out the x1 and x2 values in the data used for predictions.

# install.packages("tidybayes")
preds_0 <- dat |> 
  mutate(x1 = 0, x2 = 0) |> 
  tidybayes::add_predicted_draws(mod.0_ppc, ndraws = 100)

ggplot(preds_0) + 
  aes(x = .prediction) + 
  geom_density(aes(group = .draw)) 

file3d1c358e4004

This gets us a little closer, but these are 100 predicted values that occur at the prior intercept that are then jittered based on the draw of the sigma coefficient. Let's instead draw one value per sample at x1=x2=0 and not apply any noise. That is, we use the epred. We will also crank up the number of posterior samples at play:

epreds_0 <- dat |> 
  mutate(x1 = 0, x2 = 0) |> 
  head(1) |> 
  tidybayes::add_epred_draws(mod.0_ppc, ndraws = 1000)

ggplot(epreds_0) + 
  aes(x = .epred) + 
  geom_density() +
  stat_function(
    fun = dnorm, 
    args = list(mean = 3.5, sd = 2.5), 
    geom = "line", color = "red"
  )

file3d1c35d41b7e

So we recovered the prior distribution of the intercept by

rduesing commented 5 months ago

Hello TJ,

many thanks for your reply. Apparently, pp_check i does not what I though it does.

I have a different understanding, what is meant with "prior predictive check". To me it is a check to see how the joint priors behave and would predict a distribution of my data (or the mean, median, regression coefficients, whatever) without seeing the data itself. Therefore, I do not understand why the prior distributions are multiplied with the data. For me your code

coef <- mod.0_ppc |> 
  as_draws_matrix() |> 
  _[1:100, c("b_Intercept", "b_x1", "b_x2", "sigma")]

gives me random draws from the priors of each parameter. I would use these draws to combine them with the likelihood function and the specified model to produce a sample in the size of my original data. I would consider these "prior data samples" as my prior predictions and compare them either to theoretical considerations (are the values in a plausible range) and/or to the data parameters itself. The latter approach may have the disadvantage that we will adjust our priors too much according to the data. In my example, I deliberately used "wrong" or not well suited priors (as I said for educational purposes), but it may be the case that your prior knowledge is very different from your data.

At least this is how I understood McElreath (2022) about prior predictive checks and Gabry et al. (2019), only to use the prior distributions, without predictions using the data,

Can you see my struggle/problem?

tjmahr commented 5 months ago

I understand the confusion. With a posterior predictive check, we want to predictions that condition on the observed data but in a prior predictive check, we might not even have data in hand and it should support predictions in the absence of data.

The prior predictive distribution is just like the posterior predictive distribution with no observed data, so that a prior predictive check is nothing more than the limiting case of a posterior predictive check with no data. https://mc-stan.org/docs/stan-users-guide/posterior-predictive-checks.html#prior-predictive-checks

But pp_check() is a generic prior/posterior predictive function so by default it will assume that you are doing a posterior predictive check and condition on the observed xs. The other problem is that the pp_check() wants to show something for each prior/posterior draw (like one density per draw) to show the distributions of simulated datasets but the prior predictive check that you have in mind combines all the draws and plots the one distribution. So, to do that, you can do it manually:

# generate 1 prediction per draw at xs = 0
dat_0 <- tibble(x1 = 0, x2 = 0, y = 0)
preds <- posterior_predict(mod.0_ppc, newdata = dat_0) 
colnames(preds) <- "ypred"  

ggplot(as_draws_df(preds)) + 
  aes(x = ypred) + 
  geom_density()

file40906f8e2e84

Or once you have a matrix of predictions in hand, you can use bayesplot:

mcmc_dens(preds)

file40901ae43894

rduesing commented 5 months ago

Hi TJ!

Many thanks for your time and help, I think that solves my confusion!

All the best! Rainer