xfim / ggmcmc

Graphical tools for analyzing Markov Chain Monte Carlo simulations from Bayesian inference
111 stars 31 forks source link

Multiple matches with family argument in ggs() #73

Open mdodrill-usgs opened 3 years ago

mdodrill-usgs commented 3 years ago

Hi,

When using the family argument to pull only a set of named parameters from a fitted model object, the function ggs() returns multiple matches when the parameter names are similar (see example below). This isn't the behavior I expected when looking at the documentation for ggs() (i.e., A family of parameters is considered to be any group of parameters with the same name but different numerical value between square brackets (as beta[1], beta[2], etc).)

library(rstan)
library(ggmcmc)

# toy model
ex_model_code <- '
  parameters {
    real alpha[2,3];
    real alpha_2[2]; 
  } 
  model {
    for (i in 1:2) for (j in 1:3) 
      alpha[i, j] ~ normal(0, 1); 
    for (i in 1:2) 
      alpha_2 ~ normal(0, 2); 
  } '
fit <- stan(model_code = ex_model_code, chains = 4) 

f1 <- ggs(fit, family = "alpha")

# both alpha & alpha_2 are returned
unique(f1$Parameter)

Maybe this is the desired behavior, but a nice feature would be to only return the parameters that match the string before the square brackets (provided as the family argument). not multiple parameters (multiple matches, "alpha" and "alpha_2" in the example).

Thanks

xfim commented 3 years ago

Hi @mdodrill-usgs . Thank you very much for using ggmcmc and for reporting issues.

In this case the behaviour is the one that I had in mind. Maybe the documentation is misleading and certainly I would have to change it depending on how this conversation goes.

The idea with family is to empower the user through the use of regular expressions. So basically anything in the family is a plain regular expression in R. This means that it is very easy to do things like:

So unless you have a strong preference and for changing its behaviour to return only what is in between the brackets in the family, I would prefer to keep the current situation. Another option would be, if you think this really adds to the current situation, to add another argument specifying whether the family should work as a regular expression (current situation, and default) or as a strict within-the-brakets character.

What do you think?

mdodrill-usgs commented 3 years ago

Hi @xfim,

Thank you for the prompt reply and your work on ggmcmc.

I think just some small changes to the documentation would help to clarify the behavior when family is supplied as a character vector. Maybe something like, "When family is given as a character vector, any parameters containing the string supplied are returned (family = "beta", both beta[1] and beta.alpha[1] are returned)." could be added (if that is the correct logic of how the matching works with a character vector).

Also, for those not as familiar with building regular expressions (like me), maybe a condensed version of your response, above, could be added to the example in the documentation of ggs(). This would help to guide users on some handy expressions (and give an example of differences between the supplying family as a string or a regular expression).

Building off the little Stan model above, maybe something like this:

#' @examples 
#' \dontrun{
#' library(rstan)
#' 
#' # toy model
#' ex_model_code <- '
#' parameters {
#' real alpha[2,3];
#' real alpha_sigma[2];
#' } 
#' model {
#' for (i in 1:2) for (j in 1:3) 
#' alpha[i, j] ~ normal(0, 1); 
#' for (i in 1:2) 
#' alpha_sigma ~ normal(0, 2); 
#' } '
#' fit <- stan(model_code = ex_model_code, chains = 4)
#' 
#' # family as character vector, both alpha and alpha_sigma:
#' f1 <- ggs(fit, family = "alpha")
#' 
#' # only strict alpha and the brackets:
#' f1 <- ggs(fit, family = "^alpha\\[")
#' 
#' # Only the first element of the first dimension in alpha: 
#' f1 <- ggs(fit, family = "^alpha\\[1,")
#' 
#' # ect...
#' 
#' }