mjskay / tidybayes

Bayesian analysis + tidy data + geoms (R package)
http://mjskay.github.io/tidybayes
GNU General Public License v3.0
712 stars 59 forks source link

spread/gather_draws fails for stanfit from gqs #244

Closed wjakethompson closed 3 years ago

wjakethompson commented 4 years ago

First, thank you for making such an incredible package! It has really streamlined many of my workflows, and I appreciate all the work you have and are putting into it.

I recently came across a problem I haven't been able to figure out. Occasionally, I will fit a model in {rstan} and then then later use that fitted model to draw sample of generated quantities using rstan::gqs(). However, the spread/gather_draws functions fail when I try to tidy samples from rstan::gqs(), even though the object returned is still a stanfit. I've included a reprex below.

library(rstan)
library(tidybayes)

m <- stan_model(model_code = 'parameters {real y;}
                              model {y ~ normal(0,1);}')
f <- sampling(m, iter = 300)
class(f)
#> [1] "stanfit"
#> attr(,"package")
#> [1] "rstan"

spread_draws(f, y)
#> # A tibble: 600 x 4
#>    .chain .iteration .draw      y
#>     <int>      <int> <int>  <dbl>
#>  1      1          1     1 -1.05 
#>  2      1          2     2 -0.226
#>  3      1          3     3 -1.67 
#>  4      1          4     4 -0.979
#>  5      1          5     5 -0.979
#>  6      1          6     6 -1.63 
#>  7      1          7     7 -2.23 
#>  8      1          8     8 -0.509
#>  9      1          9     9 -0.301
#> 10      1         10    10  0.573
#> # … with 590 more rows

# now use the fitted model, f, to make new generated quantities
m2 <- stan_model(model_code = 'parameters {real y;}
                               generated quantities {real y_rep = normal_rng(y,1);}')
f2 <- gqs(m2, draws = as.matrix(f))
class(f2)
#> [1] "stanfit"
#> attr(,"package")
#> [1] "rstan"

spread_draws(f2, y_rep)
#> Error in do.call(cbind, attr(x, "sampler_params")): second argument must be a list

# if generated quantities are part of the full model, then no issue
m3 <- stan_model(model_code = 'parameters {real y;}
                               model {y ~ normal(0,1);}
                               generated quantities {real y_rep = normal_rng(y,1);}')
f3 <- sampling(m3, iter = 300)
class(f3)
#> [1] "stanfit"
#> attr(,"package")
#> [1] "rstan"

spread_draws(f3, y_rep)
#> # A tibble: 600 x 4
#>    .chain .iteration .draw   y_rep
#>     <int>      <int> <int>   <dbl>
#>  1      1          1     1 -0.767 
#>  2      1          2     2  0.395 
#>  3      1          3     3  0.287 
#>  4      1          4     4 -0.0888
#>  5      1          5     5 -0.782 
#>  6      1          6     6  0.812 
#>  7      1          7     7 -1.27  
#>  8      1          8     8 -1.19  
#>  9      1          9     9 -1.56  
#> 10      1         10    10 -1.47  
#> # … with 590 more rows

Created on 2020-04-10 by the reprex package (v0.3.0)

What I think is happening

It looks like the issue is here:

https://github.com/mjskay/tidybayes/blob/a830130342a55815e553baa77007598b3e240aa3/R/tidy_draws.R#L172

Because rstan::gqs() isn't doing any sampling, there are no sampler parameters, so this line is throwing an error. Would it be possible to only include the sampler parameters if they exist? That is, rather than error they just wouldn't be returned when they weren't found. So, for example, if you tried to do spread_draws(f2, treedepth__) (where f2 is the object of generated quantities from rstan::gqs() above), you'd get the same error you normally do when a specified parameter isn't available:

Error in spread_draws_long_(draws, variable_names, dimension_names, regex = regex,  : 
  No variables found matching spec: treedepth__

To be more clear, I'd expect something like this:

tidybayes::get_variables(f)
#> [1] "y"             "lp__"          "accept_stat__" "stepsize__"   
#> [5] "treedepth__"   "n_leapfrog__"  "divergent__"   "energy__"

tidybayes::get_variables(f2)
#> [1] "y_rep"
mjskay commented 4 years ago

Glad the package is helpful!

Yeah that's definitely a bug, thanks for spotting it. The fix you suggest makes sense.

lejon commented 3 years ago

Agree on a wonderful package, thanks so much for your great work! Any status on this, I'm having the same problem, and I really don't want to leave the wonderful tidybayes world for a workaround! :)

mjskay commented 3 years ago

ah hrm. Yeah, this should be a relatively easy fix, unfortunately tidybayes stuff is pretty far down my stack at the moment, sorry :/ (unless someone wanted to submit a PR with a test case and fix?). There are going to be bigger changes to tidybayes internals once {posterior} is released and I've been focusing energy there.

In the meantime a simple workaround should be to use posterior::as_draws_df() on the model to get a tidy data frame of draws and then use spread_draws() or gather_draws() on the result. If you try that and it doesn't work please let me know.

mjskay commented 3 years ago

should be fixed now