quentingronau / bridgesampling

R package for bridge sampling
32 stars 16 forks source link

simplex vs R's object typing problems when supplying matrices #30

Open FBartos opened 3 years ago

FBartos commented 3 years ago

Hi, I encountered a few issues when trying to supply simplex type parameters to bridgesampling, mostly due to R converting column matrices to vectors.

A) passing a 2 column matrix with simplex leads to transformation into a vector when omiting the last column in samples <- samples[, -last_sim] and error in .transform2Real when trying to call ncol() on supposed to be a matrix, now a vector theta(passed as samples) down the line: transTypes <- character(ncol(theta))

code for reproducing the bug

# simulate posterior
K         <- 2
par_names <- paste0("pi[",1:K,"]")
post      <- LaplacesDemon::rdirichlet(1000, rep(1, K))
colnames(post) <- par_names
post      <- coda::as.mcmc(post)

# lb, ub, and types
lb <- rep(0, K)
up <- rep(1, K)
pt <- rep("simplex", K)
names(lb) <- names(up) <- names(pt) <- par_names

# log function
log_func <- function(samples.row, data){

  pi <- samples.row[ paste0("pi[",1:(data$K-1),"]") ]
  pi <- c(pi, 1 - sum(pi))
  return(LaplacesDemon::ddirichlet(pi, rep(1, data$K), log = TRUE))
}

bridgesampling::bridge_sampler(
  samples          = post,
  log_posterior    = log_func,
  data             = list(K = K),
  lb               = lb,
  ub               = up,
  param_types      = pt)

B) passing a 2 parameter simplex in general (e.g., adding another parameter to the previous example) crashes the code a bit further down the line in .transform2Real because cs has [parameters, samples] instead of [samples, parameters] dimensions in this case, crashing on z_k <- (simplex_theta / (1L - cs))

I'm actually a bit confused by the purpose of these lines here:

    simdim <- ncol(simplex_theta)
    cs     <- cbind(0L, t(apply(simplex_theta, 1L, cumsum))[, -simdim, drop = FALSE])

because the simplex parameters are already passed without the last one, thus they do not sum to one. Because the last parameter has been already removed previously in bridge_sampler.matrix:

    # Remove the last simplex variable because it is superfluous.
    last_sim <- which(is_simplex_param)[sum(is_simplex_param)]
    samples <- samples[, -last_sim, drop = FALSE]
    param_types <- param_types[-last_sim]
    lb <- lb[-last_sim]
    ub <- ub[-last_sim]

code for reproducing the bug:

K         <- 2
par_names <- c(paste0("pi[",1:K,"]"), "x")
post      <- cbind(LaplacesDemon::rdirichlet(1000, rep(1, K)), rnorm(1000))
colnames(post) <- par_names
post      <- coda::as.mcmc(post)

# lb, ub, and types
lb <- c(rep(0, K), -Inf)
up <- c(rep(1, K),  Inf)
pt <- c(rep("simplex", K), "real")
names(lb) <- names(up) <- names(pt) <- par_names

# log function
log_func <- function(samples.row, data){

  pi <- samples.row[ paste0("pi[",1:(data$K-1),"]") ]
  pi <- c(pi, 1 - sum(pi))
  return(LaplacesDemon::ddirichlet(pi, rep(1, data$K), log = TRUE))
}

bridgesampling::bridge_sampler(
  samples          = post,
  log_posterior    = log_func,
  data             = list(K = K),
  lb               = lb,
  ub               = up,
  param_types      = pt,
  silent           = F)
FBartos commented 3 years ago

I already checked that the A) issue can be fixed by changing samples <- samples[, -last_sim] into samples <- samples[, -last_sim, drop = FALSE] on line 411 in bridge_sampler.R. I didn't add a PR because I'm not sure that the column removal is supposed to be there given the B) issue