stan-dev / posterior

The posterior R package
https://mc-stan.org/posterior/
Other
167 stars 23 forks source link

Implement as_draws generics in other packages? #183

Open mjskay opened 3 years ago

mjskay commented 3 years ago

(Note: I'm opening this here for discussion first rather than opening a bunch of essentially duplicate issues in several packages)

Now that posterior is on CRAN, would it make sense to have as_draws() / as_draws_XXX() implementations for objects in brms, rstanarm, rstan, cmdstanr, etc? I think it would help fulfill part of the promise of {posterior}; that is, users don't have to remember what particular function for a given model type gets them draws in what format, they can just chuck the object into as_draws_XXX() for whatever format they want.

E.g. with cmdstanr, rather than doing this:

model$draws(format = "XXX")

People could (optionally) do:

model %>% as_draws_XXX()

The advantage of the latter code is that as more packages also implement these generics it becomes easier for users to move between packages. @paul-buerkner @jgabry thoughts? I dunno if this was already the plan anyway.

paul-buerkner commented 3 years ago

I agree. At least for brms, this is the plan: https://github.com/paul-buerkner/brms/issues/1204

jgabry commented 3 years ago

Yeah I think this is a good plan.

mjskay commented 3 years ago

Awesome! I'll close this and open issues on rstanarm and cmdstanr. as_draws_XXX() appears to already work on rstan::stanfit from my quick testing, as stanfit objects appear to mimic arrays so it gets routed appropriately through as_draws_array()

paul-buerkner commented 3 years ago

I think there may be some more general things to discuss so I am reopening this isuse. I hope that is ok.

1) Shall we add additional arguments to as_draws_XXX.somefit to mimic existing applications people are used to from as.array and friends. Specifically:

- Shall we add a `variable` argument to preselect variables to be extract instead of extracing all, effectively mimicing the `pars` argument of `as.array.stanfit`?
- How to handle warmup samples? Shall we add an `inc_warmup` (or similarily named) argument to `as_draws_XXX.somefit`?
- What should the default format via `as_draws` be for brms and rstanarm models be? The apparent behavior of `stanfit` suggest we should use `draws_array`. Is that a choice we like?

2) What shall we do with the existing as.array.somefit (and friends) of brms or rstanarm for example? Shall we keep them as is? Or shall we deprecate them and say that they will eventually become aliases of the corresponding as_draws_xxx methods?

3) What shall niterations and ndraws count? I assume the post-warmup iterations / draws, that is, ignoring warmup draws?

4) Shall we add some generics for nwarmup and nthin (or nthinning?) to posterior?

mjskay commented 3 years ago
  1. Shall we add additional arguments to as_draws_XXX.somefit to mimic existing applications people are used to from as.array and friends. Specifically:
    • Shall we add a variable argument to preselect variables to be extract instead of extracing all, effectively mimicing the pars argument of as.array.stanfit?
    • How to handle warmup samples? Shall we add an inc_warmup (or similarily named) argument to as_draws_XXX.somefit?

These both sound like good ideas. Are you suggesting they should be in the generic or just the specific implementations? There's some benefit to standardization enforced by putting them in the generic but also they wouldn't apply in all cases.

  • What should the default format via as_draws be for brms and rstanarm models be? The apparent behavior of stanfit suggest we should use draws_array. Is that a choice we like?

I can only say from a user perspective for those packages, but my suggestion would be whatever format is fastest to convert the internal format to. I think the stanfit behavior might just be an accident of the fact that is.array() on a stanfit returns TRUE so as.array() gets call inside posterior to do the conversion?

  1. What shall we do with the existing as.array.somefit (and friends) of brms or rstanarm for example? Shall we keep them as is? Or shall we deprecate them and say that they will eventually become aliases of the corresponding as_draws_xxx methods?

Again speaking as a user, I think as.array() is nice since it gives interoperability with packages that know how to do things with arrays via standard coercion functions. So I wouldn't remove it, but maybe change examples to use/advertise as_draws_XXX instead?

  1. What shall niterations and ndraws count? I assume the post-warmup iterations / draws, that is, ignoring warmup draws?

Do you mean niterations / ndraws applied to fit objects? Seems reasonable.

  1. Shall we add some generics for nwarmup and nthin (or nthinning?) to posterior?

Again I'm assuming you mean for fit objects... would this be metadata that gets included with a draws object that we keep around, or functions that are only applied to fit objects? I suppose we could keep track of this metadata if it were valuable to do so, though I'm not sure its worth it.

paul-buerkner commented 3 years ago

Thank you for your feedback.

These both sound like good ideas. Are you suggesting they should be in the generic or just the specific implementations? There's some benefit to standardization enforced by putting them in the generic but also they wouldn't apply in all cases.

I would not want to add these arguments to the generics themselves, but perhaps add to their docs that we would recommend arguments being named in a certain way, if a method wants to support them.

I can only say from a user perspective for those packages, but my suggestion would be whatever format is fastest to convert the internal format to. I think the stanfit behavior might just be an accident of the fact that is.array() on a stanfit returns TRUE so as.array() gets call inside posterior to do the conversion?

Then draws_list it is I think since rstan stores draws in this list like format. In fact, the draws_list format was added keeping specifically rstan in mind.

Again speaking as a user, I think as.array() is nice since it gives interoperability with packages that know how to do things with arrays via standard coercion functions. So I wouldn't remove it, but maybe change examples to use/advertise as_draws_XXX instead?

I agree. I will leave as.array.brmsfit as is from the user side but let posterior handle the backend. The returned object will still be an array (not a draws_array) as to work consistently with methods expecting the default dropping behavior of arrays.

Do you mean niterations / ndraws applied to fit objects? Seems reasonable.

Yes, exactly.

Again I'm assuming you mean for fit objects... would this be metadata that gets included with a draws object that we keep around, or functions that are only applied to fit objects? I suppose we could keep track of this metadata if it were valuable to do so, though I'm not sure its worth it.

I simply mean generics to be usable on fit objects, otherwise returning 0 warmup and 1 thinning. The goal here would only be to align the function names, not to support warmup draws in the posterior draws formats.

paul-buerkner commented 3 years ago

While working on supporting posterior in brms, I stumbled upon another question which warrants some discussion I think. In brms, I allow to extract subsets of draws (via a badely named argument called subset). Of course, we can just chain one as_draws and then an subset_draws but I was wondering if it could be sensible to allow the subsetting arguments such as draw, iteration and chain already in as_draws.brmsfit similar to allowing argument variable as mentioned above?

For brms, the advantage would be that as.data.frame and friends can work with one single call. Since they remove the posterior classes for consistent behavior, doing a subset_draws after extraction won't work. So we either have to do it in a single call, build some other workaround, or simply require staying in the posterior-verse of functions and objects. The latter may not be ideal since even brms itself requires (for the time being at least) usual data.frames etc. and cannot stay in the posterior-verse consistently yet.

jgabry commented 2 years ago

Just commenting here to mention that we now (finally, sorry for the delay!) have as_draws() support for CmdStanR models:

https://github.com/stan-dev/cmdstanr/pull/581

Because as_draws(...) is just a wrapper around fit$draws(...) I didn't explicitly add any other arguments to as_draws() because ... can be used to pass any supported arguments (e.g. inc_warmup) to fit$draws(). I think this makes sense for CmdStanR, but I could also add all the supported arguments to the signature of as_draws() method if everyone else feels strongly about that.

paul-buerkner commented 2 years ago

Just a note that as_draws and friends are supported in brms since some time now.

avehtari commented 1 year ago

It seems like the other packages now have as_draws, so should this issue have a new title to describe the remaining tasks?

paul-buerkner commented 1 year ago

I guess so. From my side, we can also close this issue as I think we have made enough progress in the relevant downstream packages(?) @jgabry how far is rstanarm with the posterior incorporation?

jgabry commented 1 year ago

I added as_draws() methods to rstanarm but that’s about it. There are other places in rstanarm where posterior could be used, but nothing urgent I think, it would just be internal stuff unless I’m forgetting something.

paul-buerkner commented 1 year ago

I agree. I would vote to close this issue. Any objections?