paul-buerkner / brms

brms R package for Bayesian generalized multivariate non-linear multilevel models using Stan
https://paul-buerkner.github.io/brms/
GNU General Public License v2.0
1.27k stars 183 forks source link

non-matching draw subsets from `posterior_linpred` #1461

Closed jsocolar closed 1 year ago

jsocolar commented 1 year ago

Hi @paul-buerkner , not sure if there is a good reason for the following behavior. If so, feel free to close this issue without comment.

posterior_linpred.brmsfit, if passed an ndraws argument, eventually calls validate_draw_ids, which runs the following snippet: draw_ids <- sample(seq_len(ndraws_total), ndraws).

What this means is that in a distributional regression, if I do

a <- posterior_linpred(brmsfit, dpar = "mu", ndraws = 100)
b <- posterior_linpred(brmsfit, dpar = "sigma", ndraws = 100)

I cannot do standard sample arithmetic with a and b because they contain different draws in different orders.

The current workaround, as far as I am aware, is to specify a consistent seed ahead of each call to posterior_linpred, but I think this isn't ideal primarily because the need to do so is non-obvious.

A possible alternative might be to replace draw_ids <- sample(seq_len(ndraws_total), ndraws) with draw_ids <- floor(seq_len(ndraws) * ndraws_total / ndraws). This should cause ndraws to induce deterministic thinning of the chains. This would fool-proof the process, but I don't know what other behaviors might rely on stochastic resampling of the draw ids on successive calls to posterior_linpred.

Another alternative might be to allow the user to pass a character vector of dpar names such that draw_ids <- sample(seq_len(ndraws_total), ndraws) runs only once outside of the calls to prepare_predictions, thereby returning predictions for a consistent set of draws.

A third alternative might be to provide a seed argument to posterior_linpred, which could clarify to the user that there's an element of randomness in the way the summary is constructed, and could enable consistency without requiring the user to tinker with seeds themselves. If fool-proofness is desired, the seed could even default to some non NULL value.

I guess this issue isn't restricted to posterior_linpred and probably surfaces in a variety of post-processing functions.

paul-buerkner commented 1 year ago

use draw_ids.

Jacob B. Socolar @.***> schrieb am Do., 16. Feb. 2023, 04:53:

Hi @paul-buerkner https://github.com/paul-buerkner , not sure if there is a good reason for the following behavior. If so, feel free to close this issue without comment.

posterior_linpred.brmsfit, if passed an ndraws argument, eventually calls validate_draw_ids, which runs the following snippet: draw_ids <- sample(seq_len(ndraws_total), ndraws).

What this means is that in a distributional regression, if I do

a <- posterior_linpred(brmsfit, dpar = "mu", ndraws = 100) b <- posterior_linpred(brmsfit, dpar = "sigma", ndraws = 100)

I cannot do standard sample arithmetic with a and b because they contain different draws in different orders.

The current workaround, as far as I am aware, is to specify a consistent seed ahead of each call to posterior_linpred, but I think this isn't ideal primarily because the need to do so is non-obvious.

A possible alternative might be to replace draw_ids <- sample(seq_len(ndraws_total), ndraws) with draw_ids <- floor(seq_len(ndraws) * ndraws_total / ndraws). This should cause ndraws to induce deterministic thinning of the chains. This would fool-proof the process, but I don't know what other behaviors might rely on stochastic resampling of the draw ids on successive calls to posterior_linpred.

Another alternative might be to allow the user to pass a character vector of dpar names such that draw_ids <- sample(seq_len(ndraws_total), ndraws) runs only once outside of the calls to prepare_predictions, thereby returning predictions for a consistent set of draws.

A third alternative might be to provide a seed argument to posterior_linpred, which could clarify to the user that there's an element of randomness in the way the summary is constructed, and could enable consistency without requiring the user to tinker with seeds themselves. If fool-proofness is desired, the seed could even default to some non NULL value.

I guess this issue isn't restricted to posterior_linpred and probably surfaces in a variety of post-processing functions.

— Reply to this email directly, view it on GitHub https://github.com/paul-buerkner/brms/issues/1461, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCW2ADCHWRUCVKNI2NH2WDWXWQBZANCNFSM6AAAAAAU5URY3I . You are receiving this because you were mentioned.Message ID: @.***>