mjskay / tidybayes

Bayesian analysis + tidy data + geoms (R package)
http://mjskay.github.io/tidybayes
GNU General Public License v3.0
712 stars 59 forks source link

Plot observed values against the distribution of predicted values (rstan + tidybayes) #273

Closed fsdias closed 3 years ago

fsdias commented 3 years ago

Hi, I'm trying to plot observed values against the distribution of predicted values, but it's nor working. I'm not sure if I'm doing something wrong or this is not possible with tidybayes.

Consider this dataset:

seaice<-structure(list(year = 1979:2017, extent_north = c(12.328, 12.337, 
12.127, 12.447, 12.332, 11.91, 11.995, 12.203, 12.135, 11.923, 
11.967, 11.694, 11.749, 12.11, 11.923, 12.011, 11.415, 11.841, 
11.668, 11.757, 11.691, 11.508, 11.6, 11.363, 11.397, 11.24, 
10.907, 10.773, 10.474, 10.978, 10.932, 10.711, 10.483, 10.406, 
10.897, 10.79, 10.566, 10.151, 10.373), extent_south = c(11.7, 
11.23, 11.435, 11.64, 11.389, 11.454, 11.618, 11.088, 11.554, 
12.131, 11.426, 11.41, 11.545, 11.399, 11.42, 11.774, 11.795, 
11.769, 11.39, 11.738, 11.761, 11.747, 11.673, 11.222, 11.969, 
11.961, 11.695, 11.461, 11.687, 12.239, 12.049, 12.107, 11.501, 
12.004, 12.524, 12.776, 12.414, 11.156, 10.693), y_pred = c(12.5010736820052, 
12.44927538282, 12.3875609801368, 12.3439172909124, 12.2793026532661, 
12.2231801011696, 12.1773617980563, 12.1176950156742, 12.0625556856437, 
12.0113713804055, 11.9556901906578, 11.9033220161444, 11.8509221133541, 
11.7987684050175, 11.7354481032332, 11.6905163598079, 11.6256645758393, 
11.5774130610859, 11.516390260077, 11.4648818398791, 11.4111044559349, 
11.3486859124078, 11.3029935843154, 11.2427838230488, 11.1963290189074, 
11.1379923127939, 11.082036017241, 11.0308609752126, 10.977887325884, 
10.9187371316802, 10.8657144308698, 10.8149289833779, 10.7591410431174, 
10.7073183853358, 10.651322057159, 10.5876957399523, 10.5286236062281, 
10.4761893784841, 10.4336078995164)), row.names = c(NA, -39L), class = "data.frame")

Prepare data to run in Stan

x <- I(seaice$year - 1978)
y <- seaice$extent_north
N <- length(seaice$year)
stan_data <- list(N = N, x = x, y = y)

Stan model code

data {
 int < lower = 1 > N; // Sample size
 vector[N] x; // Predictor
 vector[N] y; // Outcome
}

parameters {
 real alpha; // Intercept
 real beta; // Slope (regression coefficients)
 real < lower = 0 > sigma; // Error SD
}

transformed parameters{
  vector[N] mu = alpha + x * beta;
}

model{
 y ~ normal(mu , sigma);
}

generated quantities {
  real y_pred[N] = normal_rng(mu , sigma);
}

Fit the model and extract predicted values

library(rstan)
fit<-stan("lm.stan",data=stan_data)
post<-extract(fit)
draws<-post$y_pred
seaice$y_pred<-apply(draws,2,mean)

The model runs ok. Now I try to plot predicted vs observed values but I get the wrong result.

seaice %>%
  data_grid(extent_north= seq_range(extent_north, n = 39)) %>%
  add_draws(draws,".prediction") %>%
  ggplot(aes(y = y_pred , x = extent_north)) +
  stat_lineribbon(aes(y = .prediction), .width = c(.99, .95, .8, .5), color = "#08519C") +
  geom_point(data = seaice, size = 2) +
  scale_fill_brewer()

image

Am I doing something wrong? Or were these tools not designed to work like this? Thanks

mjskay commented 3 years ago

I assume you figured it out because you close the issue --- when working with predictions generated inside the generated quantities block in stan you will have to use the same values of the predictors (in the same order) with add_draws(), rather than the data_grid() approach.

fsdias commented 3 years ago

Thanks. I ended up finding a way of doing this with bayesplot.