Open n-kall opened 1 year ago
Perhaps, we can actually make method for draws objects that automatically compute these arguments for each variable. The default would continue to work as now but would no longer be used for draws objects.
Making a sensible method for draws would indeed be a better option.
While working to add these methods, I thought a sensible option would be for new convergence methods such as rhat.draws(x)
to simply call summarise_draws(x)
. I think this would work fine, except that summarise_draws
passes draws_array
objects to the summary functions, because of the line draws <- drop_dims_or_classes(x[, , v], dims = 3, reset_class = FALSE)
in create_summary_list
. From my understanding, reset_class = TRUE
could be used to fix this (and pass matrix
objects to the summary functions), except that variance.draws_array
is expecting a draws object (otherwise it returns the variance of each chain separately, I believe).
I'm not sure what the best approach would be to get around this, but it seems that variance
is the only summary function for which a draws_array
method is defined, and which is then used in summarise_draws, but I'm not sure defining a new variance.matrix
method (which already exists in distributional) would be sensible.
You could try to use summarize_draws but have to check how much overhead this function creates if your sole purpose is to compute a single diagnostic (say, rhat) per variable.
You could try to use summarize_draws but have to check how much overhead this function creates if your sole purpose is to compute a single diagnostic (say, rhat) per variable.
Good point. However, even if new *.draws
methods do not use summarise_draws
internally, there is still the issue that currently summarise_draws
passes single-variable draws_array
objects to the generics (and then *.default
methods are used except for variance
, which has a .draws_array
method). Introducing new *.draws
methods would interfere with this, as they would then be used by summarise_draws
rather than the default methods as currently implemented.
I see your point. @mjskay do you have an idea by any chance how to deal with this case?
Hmmm... What if summarise_draws passed a single-variable rvar down to convergence functions instead of a draws_array? draws_of(x, with_chains = TRUE)
could be used in convergence functions to basically get the same format as a single variable draws_array (and that function has little overhead). Would that solve the dispatch problems?
Perhaps. It would have to be tried out, I think.
I'll try it out and make a PR if it seems to work
Thank you for trying it out. If it works, before you spend time on a PR, we perhaps need to think about if this has other consequences, we currently do not see.
The suggestion of passing rvars to the summary functions seems to work, especially as there are already .rvar
methods for the summary functions implemented. However, I'm not sure that this is the best approach from a user perspective, as custom functions can be passed to summarise_draws
, and currently they would expect a draws_array
, so changing this could potentially break some downstream things / workflows.
Ah yeah. I think this kind of relates to other questions about modifying summarise_draws to support things like weighted draws. See this comment: https://github.com/stan-dev/posterior/issues/273#issuecomment-1440350034
That is, if we're going to change the contract that summarise_draws has with summary functions, we should probably do it in a way that also allows us to solve the issues mentioned in the comment above. Otherwise we'll just have to do it again later.
Alternatively, summarise_draws could be the "simple" (possibly even "legacy") summary function, with few expectations on the implementation of summary functions (basically: accepts an array?) and we could create a new summary function that solves the above issues without breaking existing code.
Another solution might be to check for an rvar implementation on the summary function generic and pass that format if present, and otherwise use an array. That way rvar support is opt-in.
Actually, this approach (essentially the summary function "declaring" its input format via a generic function implementation) could be used in the future to solve the other issues without making a new summarise_draws function. e.g. if rvar gets support for carrying weights around (which it really should), that would give us a way for summary functions to declare that they accept weights.
Another solution might be to check for an rvar implementation on the summary function generic and pass that format if present, and otherwise use an array. That way rvar support is opt-in.
I like this suggestion. When I get some more time for this I'll try it out
Thanks! if it helps, we had to do something similar in as_draws
, so there's already a utility function for checking for S3 method definitions:
https://github.com/stan-dev/posterior/blob/e23467b87aa5d4b5df2f7f5bc545c201186ea2c5/R/as_draws.R#L51
I've played around with detecting and using rvar methods here. It seems to work ok (need to fix the issue of functions that return a named vector like quantile2 losing names). It is limited to detecting methods from functions and strings (not formula syntax) but that's probably ok. For example in summarise_draws(x, mean, "mean", ~mean(.x))
the first two mean functions will use the mean.rvar
method and the third will pass an array in place of .x
(matching the present behaviour).
Yeah, good point... That could lead to counterintuitive behavior for users (and hard to track down bugs) if they start with a summary function then add arguments to it by changing to a formula or anonymous function. Not sure how I feel about that...
That could lead to counterintuitive behavior for users (and hard to track down bugs) if they start with a summary function then add arguments to it by changing to a formula or anonymous function
Yes, this inconsistency could be an issue, e.g. if weights are suddenly not taken into account because of a change in syntax. Perhaps this change would really need to make rvar methods the default (if they exist) and pass array as fallback, maybe with a warning message that weights (or other attributes) are not necessarily considered. I suppose such a change would make sense only after rvars have weighting functionality, to justify a change in defaults for added functionality. But this discussion is now probably more related to issue #184.
I currently don't see an easy fix for adding .draws
methods for convergence functions without messing things up, so I'll probably leave it for now and revisit when there are related developments that might make it simpler
Yeah good point. This lends some priority to #184 then. I can take a stab at that when I next have time...
Currently the diagnostic functions
rhat
,ess_tail
andess_bulk
are intended to be used on a single variable matrix or rvar, e.g. as part ofsummarise_draws
. However, they can be applied (incorrectly) to adraws
object (all work ondraws_array
, some work ondraws_df
too), and then return a single numeric value which is not meaningful.In contrast
rstar
takes a multivariatedraws
object as input and returns an interpretable value. Asrhat
andrstar
are somewhat similar in interpretation, the different expected inputs could be confusing.Although it is specified in the documentation that these functions expect certain input, maybe it is worthwhile doing some type checking for these functions and returning a warning / error when used incorrectly (especially when used on a
draws
object) and pointing tosummarise_draws
.Example: