xfim / ggmcmc

Graphical tools for analyzing Markov Chain Monte Carlo simulations from Bayesian inference
111 stars 31 forks source link

Extend get_family for row-col parameter indices #58

Closed statnmap closed 5 years ago

statnmap commented 7 years ago

Function get_family_mcmc can be used to extract some family parameters and add information about parameter indices for advanced figures.
This works for matrix Parameters: Param[i,j] returns column with i indices and one with j indices. This also works for vector Parameters (Param[i]).
You can also add a name like col1, col2, ...

xfim commented 5 years ago

I am not sure of how this is used or how to integrate it fully. Could you please provide examples of its potential uses or ideas that you have in your mind? Thank you!

statnmap commented 5 years ago

Sure. This allows to extract values for parameters of the same family. In case you stored them as vectors or matrix in JAGS.

Here are two examples with your example dataset:

library(ggmcmc)
library(dplyr)
library(ggplot2)

data(linear)

# One dimension parameters ====
S_orig <- ggs(s)        # s is a coda object
unique(S$Parameter)

# Extract value for "beta" family
beta_family <- get_family_mcmc(S, "beta", row = "beta")
> beta_family
# A tibble: 4,000 x 5
   Iteration Chain Parameter value row  
       <int> <int> <fct>     <dbl> <chr>
 1         1     1 beta[1]    3.04 beta1
 2         2     1 beta[1]    2.85 beta1
 3         3     1 beta[1]    2.85 beta1
 4         4     1 beta[1]    2.58 beta1
 5         5     1 beta[1]    2.46 beta1
# Plot
nThin <- attr(beta_family, "nThin")  
nBurnin <- attr(beta_family, "nBurnin")

ggplot() + 
  geom_line(data = beta_family, 
            aes(x = nBurnin + Iteration * nThin,
                y = value, colour = as.factor(Chain))) +
  guides(colour = FALSE) +
  scale_x_continuous(expand = c(0,0)) +
  facet_wrap(vars(row))

image

# Create a two dimension parameters ====
S_orig <- ggs(s) 
S1 <- S_orig %>% 
  mutate(Parameter = case_when(
    Parameter == "beta[1]" ~ "beta[1,1]",
    Parameter == "beta[2]" ~ "beta[1,2]"
    ))

S2 <- S_orig %>% 
  mutate(Parameter = case_when(
    Parameter == "beta[1]" ~ "beta[2,1]",
    Parameter == "beta[2]" ~ "beta[2,2]"
  ),
  value = value - 10)

S <- rbind(S1, S2)
attr(S, "nChains") <- attributes(S_orig)$nChains
attr(S, "nParameters") <- length(unique(S$Parameter))
attr(S, "nIterations") <- attributes(S_orig)$nIterations
attr(S, "nBurnin") <- attributes(S_orig)$nBurnin
attr(S, "nThin") <- attributes(S_orig)$nThin
attr(S, "description") <- attributes(S_orig)$description

# Extract value for "beta" family
beta_family <- get_family_mcmc(S, "beta")
> beta_family
# A tibble: 8,000 x 6
   Iteration Chain Parameter value col   row  
       <int> <int> <chr>     <dbl> <chr> <chr>
 1         1     1 beta[1,1]  3.04 col1  row1 
 2         2     1 beta[1,1]  2.85 col1  row1 
 3         3     1 beta[1,1]  2.85 col1  row1 
 4         4     1 beta[1,1]  2.58 col1  row1 
# Plot
nThin <- attr(beta_family, "nThin")  
nBurnin <- attr(beta_family, "nBurnin")

ggplot() + 
  geom_line(data = beta_family, 
            aes(x = nBurnin + Iteration * nThin,
                y = value, colour = as.factor(Chain))) +
  guides(colour = FALSE) +
  scale_x_continuous(expand = c(0,0)) +
  facet_grid(rows = vars(row), cols = vars(col)) +
  ggtitle("Beta family")

image

xfim commented 5 years ago

Ok, thank you very much for the explanation, @statnmap. Now it is clear.

In fact, everything that you describe here it is managed by ggmcmc in, I believe, a more general way. The idea is to use the combination produced by "family" and "par_labels" to generate not only one or two-dimensional objects (like the one that the function you propose does), but to generalize it to as many dimensions as desired. As it is in the pull request right now, the `get_family_mcmc()' would only include two dimensions, and this is in fact handled in a more generalizable way using the "par_labels" argument. Plus, it gives you the possibility to not only identify the columns/rows, but also to assign them meaningful labels and combine them more easily with other tidy datasets that you may have.

I have written a tutorial on how to do it "a la ggmcmc", or at least how do I envision it, replicating your example: http://blog.xavier-fim.net/2019/02/families-and-batches-of-parameters-in-bayesian-inference-how-to-treat-them-using-ggmcmc/

I thank you very much for your contribution, but I think that handling it through the current functions is a more flexible and generalizable way. I will be happy to provide other examples, if you think that more detailed explanations are needed.

statnmap commented 5 years ago

I think this answers the needs addressed in this PR. Thanks.

xfim commented 5 years ago

Perfect. Then close it is. Again, thank you very much for your contribution and feel free to contact again requesting examples, better documentation, or whathever that you can think of that can improve ggmcmc.