Open njtierney opened 2 years ago
Hi @njtierney , on a related note is that calculate
returns different object class depending on whether we're doing prior or posterior predictive checks.
library(greta)
#>
#> Attaching package: 'greta'
#> The following objects are masked from 'package:stats':
#>
#> binomial, cov2cor, poisson
#> The following objects are masked from 'package:base':
#>
#> %*%, apply, backsolve, beta, chol2inv, colMeans, colSums, diag,
#> eigen, forwardsolve, gamma, identity, rowMeans, rowSums, sweep,
#> tapply
# a simple Bayesian regression model for the iris data
# priors
int <- normal(0, 5)
#> ℹ Initialising python and checking dependencies, this may take a moment.
#> ✓ Initialising python and checking dependencies ... done!
#>
coef <- normal(0, 3)
sd <- lognormal(0, 3)
# likelihood
mean <- int + coef * iris$Petal.Length
distribution(iris$Sepal.Length) <- normal(mean, sd)
# build and sample
m <- model(int, coef, sd)
draws <- mcmc(m)
# calculate values
coef_posterior <- calculate(coef, values = draws)
class(coef_posterior)
#> [1] “greta_mcmc_list” “mcmc.list”
coef_posterior_sim <- calculate(coef, values = draws, nsim = 1000)
class(coef_posterior_sim)
#> [1] “list”
coef_prior_sim <- calculate(coef, nsim = 1000)
class(coef_prior_sim)
#> [1] “list”
Created on 2022-03-28 by the reprex package (v2.0.1)
Since the different object also arrange the MCMC samples in different dimensions, it is not easy to apply the same code (e.g., involving tidybayes
) to them downstream. Specifically, I like that when nsim
is specified, calculate
returns a list of arrays with the same dimensions as the original greta_array
. I.e.,
> dim(coef_posterior_sim$coef) # first dim is number of MCMC samples
[1] 1000 1 1
> dim(coef)
[1] 1 1
But it isn't easy to recover the dimension with the posterior stored in mcmc.list
:
> dim(as.matrix(coef_posterior))
[1] 4000 1
This becomes more different when coef
has more than one row or more than one column, because they are vectorised before being saved as a mcmc.list
(right?) That said, the format of mcmc.list
makes it readily integrated into the tidybayes
syntax/grammar, e.g, the latter extract parameters as coef[row, col]
. If we decide to favour a greta_calculate
object having the same dimensions as the input greta_array
, then we need some auxillary function that turns the former into a tidybayes
and mcmc
compatible thing...
Hi @hrlai !
Thanks so much for this - great points, I think ideally want to be able to use the same/similar workflow for prior and posterior predictive checks, right? RE the list of arrays having the same dimension as the original greta array, I feel like I would prefer to have it return mcmc.list in both cases and then have ways to transform that output into other common formats? What do you think?
If you've got some examples of workflow where you run into these problems I'm happy to help test out these new features with you, at the moment the priority for the next release is getting everything to work with M1 and/or TF2.0, so we might not come back to this issue until we solve that first, just so you've got an idea of timelines/order or operations.
Standardising mcmc output is something I thought about a bit with https://github.com/njtierney/mmcc - which might still get some more development, but I see that tidybayes is a lot more mature now. Just thought I'd plug mmcc
in case the tidying functions there are useful, it deliberately does less than tidybayes
, the idea being that one you've got data in a standard format you can use data.table
or dplyr
to do the munging.
have it return mcmc.list in both cases and then have ways to transform that output into other common formats? What do you think?
Ah, I just remembered that there are plot
and other methods for mcmc.list
, so yeah keeping it in this format will be better than returning greta_array
-like outputs. People who have been using coda
should be more familiar, making greta
a more welcoming change from JAGS/BUGS and the likes... I also just realised the bayesplot
functions take in mcmc.list
too, D'oh!
By the way, I like how lightweight mmcc
is and might give it a try as an alternative to tidybayes
:)
This might make it easier to do tidying on the output, so we can avoid steps like this:
Perhaps integrating with either
mmcc
ortidybayes