mjskay / tidybayes

Bayesian analysis + tidy data + geoms (R package)
http://mjskay.github.io/tidybayes
GNU General Public License v3.0
712 stars 59 forks source link

`cmdstanr` Support: Working code included #267

Closed vincentarelbundock closed 4 years ago

vincentarelbundock commented 4 years ago

cmdstanr seems to be gearing up for a 1.0.0 release (and possible CRAN release?), and I think it would be quite easy to add support for it. To kick things off, here's a simple tidy_draws method with an example to show that this method is sufficient to make spread_draws work with cmdstanr objects.

I don't know your codebase well enough to know: What else is usually needed to add full support for a new object type?

library(tidyverse)
library(tidybayes)
library(cmdstanr)
library(coda)

tidy_draws.CmdStanMCMC = function(model, ...) {
  # parameter draws
  sample_matrix = model$draws() #[iteration, chain, variable]
  class(sample_matrix) = 'array'
  n_chain = dim(sample_matrix)[[2]]
  mcmc_list = as.mcmc.list(lapply(seq_len(n_chain), 
                                  function(chain) as.mcmc(sample_matrix[, chain, ]))) # nolint
  parameter_draws = tidy_draws(mcmc_list, ...)

  # diagnostic draws
  diagnostic_matrix = model$sampler_diagnostics()
  class(diagnostic_matrix) = 'array'
  mcmc_list = as.mcmc.list(lapply(seq_len(n_chain), 
                                  function(chain) as.mcmc(diagnostic_matrix[, chain, ]))) # nolint
  diagnostic_draws = tidy_draws(mcmc_list)

  # combine parameter and diagnostic draws
  draws = full_join(parameter_draws, diagnostic_draws, by = c('.chain', '.iteration', '.draw'))

  # keep the constructors around in case they were set on the original model
  attr(draws, "tidybayes_constructors") = attr(model, "tidybayes_constructors")
  draws
}

stan_program <- '
data {
  vector[1000] y;
}
parameters {
  real mu;
  real<lower=0> sigma;
}
model {
  y ~ normal(mu, sigma);
  mu ~ normal(0, 1);
  sigma ~ exponential(1);
}
'
stan_data <- list('y' = rnorm(1000, 10, 2))

model <- write_stan_tempfile(stan_program)
model <- cmdstan_model(model)
model <- model$sample(stan_data)

spread_draws(model, mu, sigma)
#> # A tibble: 4,000 x 5
#>    .chain .iteration .draw    mu sigma
#>     <int>      <int> <int> <dbl> <dbl>
#>  1      1          1     1  9.91  1.99
#>  2      1          2     2  9.98  1.94
#>  3      1          3     3  9.86  2.01
#>  4      1          4     4  9.86  2.01
#>  5      1          5     5  9.97  2.00
#>  6      1          6     6  9.91  1.92
#>  7      1          7     7  9.79  1.94
#>  8      1          8     8 10.0   1.98
#>  9      1          9     9 10.0   1.96
#> 10      1         10    10  9.95  2.01
#> # … with 3,990 more rows

Created on 2020-07-28 by the reprex package (v0.3.0)

vincentarelbundock commented 4 years ago

Should this be in the context of a broader refactor as suggested here https://github.com/mjskay/tidybayes/issues/261?

mjskay commented 4 years ago

Thanks for the poke! Yes, I've been meaning to do this as part of #261, and your issue gave me the impetus to actually get it done :).

This is made a bit easier by #261 since you can use the posterior functions directly --- doesn't require as much manual munging. You can see it here:

https://github.com/mjskay/tidybayes/blob/71b66256bcfb1f7dabf5e38a4e700c86c47aab5c/R/tidy_draws.R#L241-L259

Or try it out by installing the "posterior" branch of tidybayes (it isn't on the dev or master branches yet because it requires non-CRAN packages; once posterior and cmdstanr hit CRAN that will change):

devtools::install_github("mjskay/tidybayes@posterior")

In that branch tidy_draws should work directly on cmdstanr fits. Let me know if you have any problems with it!

vincentarelbundock commented 4 years ago

Fantastic! Thanks for all your work on this package. Sooo useful.

yonicd commented 3 years ago

looks good a suggestion from poking around rstan

you can leverage an internal function to query the colnames that will be generated in the output df wo building it using rstan:::read_csv_header and then pass it into the ... in the method that is currently not used to inform the cmdstanr::model$draws to return only the columns the user requests. This will make the tidy_draws function more efficient

mjskay commented 3 years ago

Good idea! Somewhere on my todo list is making the tidy_draws generic allow variable subsetting so this kind of optimization can be done. See this issue: #121

yonicd commented 3 years ago

i'm thinking of using that logic for shredder, but have also noticed brms uses [] in the names and dont preserve that for indexing, which through me in a loop to refactor my grepping of cols.

elbamos commented 3 years ago

Which version of tidybayes supports this? I'm getting Error in mcmc.list(x) : Arguments must be mcmc objects when I try to pass cmdstanr fit objects into gather_draws().

mjskay commented 3 years ago

It's not in the CRAN version yet -- have you installed the latest version from github?

elbamos commented 3 years ago

I’m trying that now, thank you!

On Jul 13, 2021, at 1:43 PM, Matthew Kay @.***> wrote:

 It's not in the CRAN version yet -- have you installed the latest version from github?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.