stan-dev / cmdstanr

CmdStanR: the R interface to CmdStan
https://mc-stan.org/cmdstanr/
Other
144 stars 63 forks source link

error in efbmi calculation blocks saving of sample object #652

Closed bob-carpenter closed 2 years ago

bob-carpenter commented 2 years ago

Describe the bug

Ran sampling and then when it was done, got this:

Chain 1 Iteration: 798 / 800 [ 99%]  (Sampling) 
Chain 1 Iteration: 799 / 800 [ 99%]  (Sampling) 
Chain 1 Iteration: 800 / 800 [100%]  (Sampling) 
Chain 1 finished in 4752.9 seconds.
 Error in if (any(efbmi_per_chain < threshold)) { : 
missing value where TRUE/FALSE needed

and the value wasn't saved.

To Reproduce

Here's the R simulation code I was using.


library('cmdstanr')
model <- cmdstan_model('kmers.stan')
sample <- function(data) {
    model$sample(data = data, chains=1, iter_warmup=400,
                 init = 0,
                 iter_sampling=400, refresh=1)
}

optimize <- function(data) {
    model$optimize(data = data)
}

library('gtools')
sim <- function(T = 1024, K = 8) {
  M <- 4^K
  x <- matrix(NA, M, T)
  alphaX <- rep(0.1, M)
  for (t in 1:T)
    x[ , t] <- rdirichlet(0.1, alphaX)

  alphaPhi <- rep(0.5, T)
  phi <- rdirichlet(1, alphaPhi)

  y <- rmultinom(1, 10e6, x %*% t(phi))[ , 1]
  list(T = T, K = K, M = M, x = x, y = y, phi = phi[1, ])
}

##### MAIN

cat("Starting simulation.\n")
data <- sim()
cat("    Finished simulation.\n")

cat("Starting optimization.\n")
mle <- optimize(data = data)$mle()
cat("    Finished optimization.\n")

plot(mle, data$phi)

cat("Starting sampling.\n")
fit <- sample(data = data)
cat("    Finished sampling.\n")

And here's the Stan program, which goes in kmers.stan to run the script.

data {
  int<lower=0> T;                  // isoforms
  int<lower=0> K;                  // k-mer size
  int<lower=0> M;                  // distinct k-mers
  matrix[M, T] x;                  // fragment to k-mer comp. (left stoch.)
  array[M] int<lower=0> y;         // observed k-mer counts
}
transformed data {
  vector<lower=0>[T] alpha
      = rep_vector(1, T);
}
parameters {
  simplex[T] phi;                  // fragment composition
}
model {
  phi ~ dirichlet(alpha);          // prior
  y ~ multinomial(x * phi);        // likelihood
}

Expected behavior

Output from sampling. No warning about error from ebfmi in terms not intended for a user.

Operating system

> sessionInfo()
R version 4.2.0 (2022-04-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Monterey 12.2.1

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] gtools_3.9.2   cmdstanr_0.5.2

loaded via a namespace (and not attached):
 [1] magrittr_2.0.3       munsell_0.5.0        colorspace_2.0-3     R6_2.5.1             rlang_1.0.2          fansi_1.0.3         
 [7] tools_4.2.0          grid_4.2.0           data.table_1.14.2    checkmate_2.1.0      gtable_0.3.0         utf8_1.2.2          
[13] cli_3.3.0            withr_2.5.0          posterior_1.2.1      matrixStats_0.62.0   ellipsis_0.3.2       remotes_2.4.2       
[19] abind_1.4-5          tibble_3.1.7         lifecycle_1.0.1      crayon_1.5.1         processx_3.5.3       tensorA_0.36.2      
[25] farver_2.1.0         ggplot2_3.3.6        ps_1.7.0             vctrs_0.4.1          glue_1.6.2           compiler_4.2.0      
[31] pillar_1.7.0         generics_0.1.2       scales_1.2.0         backports_1.4.1      distributional_0.3.0 jsonlite_1.8.0      
[37] pkgconfig_2.0.3     
> 

CmdStanR version number

0.5.2

rok-cesnovar commented 2 years ago

Hi Bob,

I was, unfortunately, unable to replicate the issue locally. Until I find a way to replicate it, you should be able to circumvent it by disabling the automatic EBFMI check after sampling, by adding diagnostics = NULL to the $sample() call.

Apologize for the troubles.

bob-carpenter commented 2 years ago

Does that turn off everything but sampling? I don't want my calls to sample to do any posterior analysis.

This is worrisome if anyone is benchmarking Stan because we're doing lots of extra work by default, which in turn comes extra memory.

rok-cesnovar commented 2 years ago

Does that turn off everything but sampling?

Yes.

I don't want my calls to sample to do any posterior analysis.

We don't do any posterior analysis even if you leave it on. What is currently run by default is it diagnoses divergences, ebfmi and max treedepth hits, so it inspects 3 columns from the CSV file and the number of parameters doesn't have an effect on the complexity of the analysis. The package we are using to read CSV files can selectively read individual columns.

I would oppose adding any parameters analysis to the default behaviour - stuff like Rhat can be very slow.

We can revisit even doing this analysis we currently default - I think this was added because rstan does these checks automatically for the user as well. We can disable it before we do the 1.0 release.

This is worrisome if anyone is benchmarking Stan because we're doing lots of extra work by default, which in turn comes extra memory.

The effect of these 3 checks is really not that big. For 4 chains with 4000 samples (which is a bit exaggerated to show that it is not that bad), so 16k samples altogether the effect of running these checks is:

Test I ran:

library(cmdstanr)

file <- file.path(cmdstan_path(), "examples", "bernoulli", "bernoulli.stan")
mod <- cmdstan_model(file)

# names correspond to the data block in the Stan program
data_list <- list(N = 10, y = c(0,1,0,0,0,0,0,0,0,1))

fit <- mod$sample(
  data = data_list, 
  seed = 123, 
  chains = 4, 
  parallel_chains = 4,
  iter_sampling = 4000,
  diagnostics = NULL
)

print(memuse::Sys.procmem())
start <- Sys.time()
diagnostics <- fit$diagnostic_summary() # this is the same thing as if you would run diagnostics after the run
print(Sys.time() - start)
print(memuse::Sys.procmem())
bob-carpenter commented 2 years ago

Those 16K draws finish in 0.1s on my machine (2 year old iMac Pro) when I fix the command, but I couldn't get the diagnostics thing to work to time it. Here's what I get:

> fit <- mod$sample(
+     data = data_list, 
+     seed = 123, 
+     chains = 4, 
+     parallel_chains = 4,
+     iter_sampling = 4000,
+     diagnostics = NULL
+ )
Error in mod$sample(data = data_list, seed = 123, chains = 4, parallel_chains = 4,  : 
  unused argument (diagnostics = NULL)

and

> diagnostics <- fit$diagnostic_summary() # this is the same thing as if you would run diagnostics after the run
Error: attempt to apply non-function
rok-cesnovar commented 2 years ago

Ah, the argument was added recently with the new diagnostics.

Can you upgrade to the most recent development version by running:

remotes::install_github("stan-dev/cmdstanr")