Open jgabry opened 5 years ago
One possibility is something like what I put below (I think @andrewgelman and @avehtari and I discussed something like this one time when Aki was visiting). Not sure which columns we include, but a concise table like this would be a really nice option for models with many parameters:
variable min_mean max_mean max_rhat min_ess_bulk min_ess_tail
alpha[1] 4.5 4.5 1.0 881 300
tau[1] 3.8 3.8 1.1 386 311
omega[9] 1.5 7.4 1.0 552 272
Sigma[7,7] -1.7 5.1 1.0 421 202
Here the numbers inside brackets indicate the dimensions (we could also omit if a scalar instead of putting [1]
). At first people could confuse this with just showing the 9th element of omega
, for example, but I think we could make it clear in the doc (and this also wouldn't be the default output so they would be intentionally selecting this). The columns min_mean
and max_mean
here are the min and max of the "mean" columns if the summary was computed for the individual elements (to give a sense of how much the means can differ) but I'm not sure if we want to include that or not.
@andrewgelman Did we end up coming up with a better idea than putting the dimensions in brackets?
Anyone else have any thoughts on this?
Since brackets are used for indexing, this can cause confusion for R users. Would it be too much to have a column dim
as the second column?
variable dim min_mean max_mean max_rhat min_ess_bulk min_ess_tail
alpha 1 4.5 4.5 1.0 881 300
tau 1 3.8 3.8 1.1 386 311
omega 9 1.5 7.4 1.0 552 272
Sigma 7,7 -1.7 5.1 1.0 421 202
dim separately may be a little ugly and not not too helpful for filtering I would say as the structure would vary across variables, that is, sometimes 1 number sometimes 2 or more separated with commas.
What if we just replace the brackets to a different symbol to refer to dimension? There are lots of possibilities. Here are just a few:
variable min_mean max_mean max_rhat min_ess_bulk min_ess_tail
alpha (1) 4.5 4.5 1.0 881 300
tau (1) 3.8 3.8 1.1 386 311
omega (9) 1.5 7.4 1.0 552 272
Sigma (7,7) -1.7 5.1 1.0 421 202
variable min_mean max_mean max_rhat min_ess_bulk min_ess_tail
alpha {1} 4.5 4.5 1.0 881 300
tau {1} 3.8 3.8 1.1 386 311
omega {9} 1.5 7.4 1.0 552 272
Sigma {7,7} -1.7 5.1 1.0 421 202
variable min_mean max_mean max_rhat min_ess_bulk min_ess_tail
alpha <1> 4.5 4.5 1.0 881 300
tau <1> 3.8 3.8 1.1 386 311
omega <9> 1.5 7.4 1.0 552 272
Sigma <7,7> -1.7 5.1 1.0 421 202
@paul-buerkner I'm not sure I understand the filtering issue with dim in a separate column --- it seems to me it would be easier to filter if dims were not a part of the variable
column? E.g. filter(., variable == "Sigma")
?
@mjskay you have a good point there. Not sure how to best represent the dimension, but you are right that filtering after variable should be easily possible.
If we're gonna do one of these, I'd prefer the parentheses, as the braces look weird to me and the angle brackets make me thing of upper and lower. But if we're going to do it this way, what about indicating this in the header, thus:
variable (dims) min_mean max_mean max_rhat min_ess_bulk min_ess_tail
A
On Nov 13, 2019, at 11:34 AM, Jonah Gabry notifications@github.com wrote:
What if we just replace the brackets to a different symbol to refer to dimension? There are lots of possibilities. Here are just a few:
variable min_mean max_mean max_rhat min_ess_bulk min_ess_tail alpha (1) 4.5 4.5 1.0 881 300 tau (1) 3.8 3.8 1.1 386 311 omega (9) 1.5 7.4 1.0 552 272 Sigma (7,7) -1.7 5.1 1.0 421 202 variable min_mean max_mean max_rhat min_ess_bulk min_ess_tail alpha {1} 4.5 4.5 1.0 881 300 tau {1} 3.8 3.8 1.1 386 311 omega {9} 1.5 7.4 1.0 552 272 Sigma {7,7} -1.7 5.1 1.0 421 202 variable min_mean max_mean max_rhat min_ess_bulk min_ess_tail alpha <1> 4.5 4.5 1.0 881 300 tau <1> 3.8 3.8 1.1 386 311 omega <9> 1.5 7.4 1.0 552 272 Sigma <7,7> -1.7 5.1 1.0 421 202 — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jgabry/posterior/issues/43?email_source=notifications&email_token=AAZYCUKLYHJ4CTB45ZWQV4TQTQUBFA5CNFSM4JKQOFF2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOED6XVOA#issuecomment-553482936, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZYCUPJIO2MOH7YPDFUPHLQTQUBFANCNFSM4JKQOFFQ.
Hey guys, I'm still playing with ways to summarize the huge draws array that I'm working with, and realized that this issue is pretty relevant as well. I've approached the problem as below. If the approach looks not-crazy then I could try to clean it up and turn it into a PR. This would involve better error handling, inferring the dimensions of the nonscalar parameters, and wrangling the output into whatever format you want.
The default (which is tailored to Stan) is to "roll up" the summaries for any parameter with a [
in its name. There's also an option to pass the parameter names for rolling up, again with the assumption that they'll show up in the draws array as ^parameter_name\\[.*
. The logic for passing parameter names is that nonscalar parameters come in (at least) two different flavors. One is big random effects and covariance matrices that ought to be rolled up. The other is vectors of regression coefficients probably should not be rolled up.
Functions:
summary_rollup <- function (draws_summary, rollup_vars = NULL,
min_only = c("ess_bulk", "ess_tail"),
max_only = "rhat") {
vars <- draws_summary$variable
if (is.null(rollup_vars)) {
vars_nonscalar <- grepl("\\[", vars)
} else {
vars_nonscalar <- as.logical(colSums(do.call(rbind, lapply(paste0("^", rollup_vars, "\\["), function(x){grepl(x, vars)}))))
}
ds_scalar <- draws_summary[!vars_nonscalar, ]
ds_nonscalar <- draws_summary[vars_nonscalar, ]
varnames_nonscalar <- gsub("\\[(.*)", "", ds_nonscalar$variable)
summary_names <- names(draws_summary)[-1]
names_minmax <- summary_names[!(summary_names %in% c(min_only, max_only))]
split_nonscalar <- split(ds_nonscalar, varnames_nonscalar)[unique(varnames_nonscalar)] # [unique(varnames_nonscalar)] preserves the order of the names
min_max <- do.call(rbind, lapply(split_nonscalar, rollup_helper_minmax, names = names_minmax))
min_only <- do.call(rbind, lapply(split_nonscalar, rollup_helper_min, names = min_only))
max_only <- do.call(rbind, lapply(split_nonscalar, rollup_helper_max, names = max_only))
nonscalar_out <- cbind(min_max, max_only, min_only)
out <- list(unrolled = ds_scalar, rolled = nonscalar_out)
out
}
rollup_helper_minmax <- function(x, names){
x <- x[, names]
mm <- c(apply(x, 2, function(x) {c(min(x), max(x))}))
names(mm) <- paste0(rep(names(x), each = 2), c("_min", "_max"))
mm
}
rollup_helper_min <- function(x, names){
x <- x[, names]
min_only <- apply(x, 2, min)
names(min_only) <- paste0(names(x), "_min")
min_only
}
rollup_helper_max <- function(x, names){
x <- x[, names]
max_only <- apply(x, 2, max)
names(max_only) <- paste0(names(x), "_max")
max_only
}
Example:
ds <- summarise_draws(example_draws())
ds2 <- summarise_draws(2 * example_draws())
ds2$variable <- paste0("new_", ds2$variable)
draws_summary <- rbind(ds, ds2)
summary_rollup(draws_summary)
summary_rollup(draws_summary, rollup_vars = c("theta", "new_theta"))
summary_rollup(draws_summary, rollup_vars = "theta")
This looks like it could be very useful to me! Two quick thoughts if we go down this road:
rollup_summary()
instead of summary_rollup()
as_draws_rvars.draws_matrix()
, which already handles a number of weird cases (like missing elements, dimensions that don't start at 1, and non-numeric dimensions). Discussion of this aspect specifically should probably happen on #61
The discussion in #32 reminded me that one thing @andrewgelman and others have been wanting for a while (and that I think would also be useful as an alternative to the standard summary output) is a summary where each vector/matrix/array parameter only occupies a single line just like scalars. That is, each line corresponds to a named parameter rather than a parameter element.
(This could be its own thing or just an option to a custom print method for our existing summary objects.)
The quantities to display are debatable (e.g. maybe min of all the individual elements’ ESS, max of the Rhats, etc), so it would be good to get input from Andrew and anyone else interested.