Closed svats2k closed 3 years ago
I suspect this would be because new draws from the posterior predictive distribution are taken from a random number generator when you call each of those functions. Thus you would need to set the same seed for the random number generator before each function in order to compare them (e.g. by using set.seed()
or the seed
argument to add_predicted_draws()
)
Thanks @mjskay for the prompt response. I have tried to set the seed as well.
set.seed(1234)
yp <- posterior_predict(m4.4)
set.seed(1234)
xp <- add_predicted_draws(model = m4.4, prediction = "pred", newdata = d2)
xp %>% as_tibble() %>%
filter(.row ==i) %>% dplyr::select(pred) %>% cbind(preddr = yp[,i]) %>%
mutate(diff = preddr - pred) %>%
ggplot(aes(x = diff)) + geom_density()
The density plot of 'diff' is as follows:
If setting the seed were the issue, shouldn't I have noticed some difference in the add_fitted_draws check as well, though the latter only checks for conditional averages. They seem to match exactly to the 4th decimal place, while add_predicted_draws shows noticeable difference.
While this is highly unlikely to be a bug, I am mainly seeking clarity in my understanding.
I just tried it and did not see any difference between the two outputs. Can you try it without backend = "cmdstanr"
? That is the only difference between what I ran and the code you provided (I don't have cmdstan on this machine at the moment)
I have tried rstan based runs as well
priors <- c(prior(normal(0,2), class = "Intercept"),
prior(normal(0,2), class = 'b'),
prior(cauchy(0,2), class = "sigma"))
m4.4 <- brm(formula = hs ~ 1 + ws, data = d2, family = gaussian, prior = priors, backend = "rstan")
i <- 4
set.seed(1234)
yp <- posterior_predict(m4.4)
set.seed(1234)
xp <- add_predicted_draws(model = m4.4, prediction = "pred", newdata = d2)
xp %>%
as_tibble() %>% filter(.row ==i) %>%
dplyr::select(pred) %>% cbind(preddr = yp[,i]) %>%
mutate(diff = preddr - pred) %>%
ggplot(aes(x = diff)) + geom_density()
The results don't match again
I am wondering if there is any difference at all in the code that you've used versus mine. Can you please share it so that I can test it out on my system?
I copied and pasted your code and ran it verbatim. The output plot is empty:
require(brms)
require(tidybayes)
require(rethinking)
data("Howell1")
d <- Howell1
d2 <- d[d$age > 18,]
d2$hs <- (d2$height - mean(d2$height))/ sd(d2$height)
d2$ws <- (d2$weight - mean(d2$weight))/ sd(d2$weight)
priors <- c(prior(normal(0,2), class = "Intercept"),
prior(normal(0,2), class = 'b'),
prior(cauchy(0,2), class = "sigma"))
m4.4 <- brm(formula = hs ~ 1 + ws, data = d2, family = gaussian, prior = priors, backend = "rstan")
i <- 4
set.seed(1234)
yp <- posterior_predict(m4.4)
set.seed(1234)
xp <- add_predicted_draws(model = m4.4, prediction = "pred", newdata = d2)
xp %>%
as_tibble() %>% filter(.row ==i) %>%
dplyr::select(pred) %>% cbind(preddr = yp[,i]) %>%
mutate(diff = preddr - pred) %>%
ggplot(aes(x = diff)) + geom_density()
And if I follow it up with a call to summary()
, the difference is 0:
xp %>%
as_tibble() %>% filter(.row ==i) %>%
dplyr::select(pred) %>% cbind(preddr = yp[,i]) %>%
mutate(diff = preddr - pred) %>%
with(summary(diff))
Min. 1st Qu. Median Mean 3rd Qu. Max.
0 0 0 0 0 0
I wonder if this is a brms issue on your setup? Have you checked to see if multiple calls to posterior_predict
yield identical results with the same seed? Something like this:
set.seed(1234)
yp1 <- posterior_predict(m4.4)
set.seed(1234)
yp2 <- posterior_predict(m4.4)
summary(as.vector(yp1 - yp2))
Min. 1st Qu. Median Mean 3rd Qu. Max.
0 0 0 0 0 0
If that's not 0 I'd say it's a brms issue
You are right. it does seem a brms issue. One last request before I close the issue, can you please share the output from sessionInfo()? Just so that I try to install the same versions that you have and confirm I observe the same as well.
The versions that I am using are:
R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.1 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
other attached packages:
[1] gridExtra_2.3 bayesplot_1.8.0 cmdstanr_0.1.3 rethinking_2.13 gganimate_1.0.7 RColorBrewer_1.1-2
[7] ggrepel_0.9.1 brms_2.14.4 Rcpp_1.0.6 rstan_2.21.2 StanHeaders_2.21.0-7 cowplot_1.1.1
[13] ggplot2_3.3.3 tidybayes_2.3.1 ggdist_2.4.0 modelr_0.1.8 tidyr_1.1.2 forcats_0.5.0
[19] purrr_0.3.4 dplyr_1.0.3 magrittr_2.0.1
Sure, here. I'm running the dev version of tidybayes but that shouldn't be an issue (given that this looks like a brms problem).
R version 4.0.3 (2020-10-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] rethinking_2.13 rstan_2.21.2 ggplot2_3.3.3 StanHeaders_2.21.0-7 tidybayes_2.3.1.9000
[6] brms_2.14.4 Rcpp_1.0.5
loaded via a namespace (and not attached):
[1] minqa_1.2.4 colorspace_2.0-0 ellipsis_0.3.1 ggridges_0.5.3 rsconnect_0.8.16
[6] estimability_1.3 markdown_1.1 base64enc_0.1-3 rstudioapi_0.13 farver_2.0.3
[11] svUnit_1.0.3 DT_0.17 fansi_0.4.1 mvtnorm_1.1-1 bridgesampling_1.0-0
[16] codetools_0.2-18 splines_4.0.3 knitr_1.30 shinythemes_1.1.2 bayesplot_1.8.0
[21] projpred_2.0.2 jsonlite_1.7.2 nloptr_1.2.2.2 ggdist_2.4.0 shiny_1.5.0
[26] compiler_4.0.3 emmeans_1.5.3 backports_1.2.1 assertthat_0.2.1 Matrix_1.3-2
[31] fastmap_1.0.1 cli_2.2.0 later_1.1.0.1 htmltools_0.5.1 prettyunits_1.1.1
[36] tools_4.0.3 igraph_1.2.6 coda_0.19-4 gtable_0.3.0 glue_1.4.2
[41] reshape2_1.4.4 dplyr_1.0.2 tinytex_0.28 V8_3.4.0 vctrs_0.3.6
[46] nlme_3.1-151 crosstalk_1.1.1 xfun_0.20 stringr_1.4.0 ps_1.5.0
[51] lme4_1.1-26 mime_0.9 miniUI_0.1.1.1 lifecycle_0.2.0 gtools_3.8.2
[56] statmod_1.4.35 MASS_7.3-53 zoo_1.8-8 scales_1.1.1 colourpicker_1.1.0
[61] promises_1.1.1 Brobdingnag_1.2-6 inline_0.3.17 shinystan_2.5.0 gamm4_0.2-6
[66] yaml_2.2.1 curl_4.3 gridExtra_2.3 loo_2.4.1 stringi_1.5.3
[71] dygraphs_1.1.1.6 boot_1.3-25 pkgbuild_1.2.0 shape_1.4.5 rlang_0.4.10
[76] pkgconfig_2.0.3 matrixStats_0.57.0 distributional_0.2.1 evaluate_0.14 lattice_0.20-41
[81] purrr_0.3.4 rstantools_2.1.1 htmlwidgets_1.5.3 processx_3.4.5 tidyselect_1.1.0
[86] plyr_1.8.6 magrittr_2.0.1 R6_2.5.0 generics_0.1.0 pillar_1.4.7
[91] withr_2.3.0 mgcv_1.8-33 xts_0.12.1 abind_1.4-5 tibble_3.0.4
[96] crayon_1.3.4 arrayhelpers_1.1-0 rmarkdown_2.6 grid_4.0.3 callr_3.5.1
[101] forcats_0.5.0 threejs_0.3.3 digest_0.6.27 xtable_1.8-4 tidyr_1.1.2
[106] httpuv_1.5.5 RcppParallel_5.0.2 stats4_4.0.3 munsell_0.5.0 shinyjs_2.0.0
@mjskay I had posted this issue in the brms repo and the issue has been root caused and I thought I will paste the response here.
"If you have set options(mc.cores = <more than 1>), posterior_predict will evaluate in parallel by default, unless you change the core argument. On windows, parallel execution is done via parallel::parLapply and I don't know how that function respects seeds, if at all. When executing the code in serial (with 1 core) the results are reproducible."
Once I set the mc.cores to 1, I no longer see the discrepancy between posterior_predict and add_predicted_draws.
Thanks for posting that back --- that's good to know!
I am using the Howell1 dataset from rethinking package.
Building a simple brms model using one numeric and one categorical predictor
I am trying to understand how add_fitted_draws and add_predicted_draws work.
Considering add_fitted_draws:
Based on the documentation add_fitted_draw internally uses posterior_epred or its equivalent in brms and the results exactly match.
Now when I move on to do exactly the same between add_predicted_draws and posterior_predict, the results don't match
I am pretty sure there is a gap in my understanding, please advice.