Closed bob-carpenter closed 2 years ago
Hi Bob,
I was, unfortunately, unable to replicate the issue locally. Until I find a way to replicate it, you should be able to circumvent it by disabling the automatic EBFMI check after sampling, by adding diagnostics = NULL
to the $sample()
call.
Apologize for the troubles.
Does that turn off everything but sampling? I don't want my calls to sample
to do any posterior analysis.
This is worrisome if anyone is benchmarking Stan because we're doing lots of extra work by default, which in turn comes extra memory.
Does that turn off everything but sampling?
Yes.
I don't want my calls to sample to do any posterior analysis.
We don't do any posterior analysis even if you leave it on. What is currently run by default is it diagnoses divergences, ebfmi and max treedepth hits, so it inspects 3 columns from the CSV file and the number of parameters doesn't have an effect on the complexity of the analysis. The package we are using to read CSV files can selectively read individual columns.
I would oppose adding any parameters analysis to the default behaviour - stuff like Rhat can be very slow.
We can revisit even doing this analysis we currently default - I think this was added because rstan does these checks automatically for the user as well. We can disable it before we do the 1.0 release.
This is worrisome if anyone is benchmarking Stan because we're doing lots of extra work by default, which in turn comes extra memory.
The effect of these 3 checks is really not that big. For 4 chains with 4000 samples (which is a bit exaggerated to show that it is not that bad), so 16k samples altogether the effect of running these checks is:
Test I ran:
library(cmdstanr)
file <- file.path(cmdstan_path(), "examples", "bernoulli", "bernoulli.stan")
mod <- cmdstan_model(file)
# names correspond to the data block in the Stan program
data_list <- list(N = 10, y = c(0,1,0,0,0,0,0,0,0,1))
fit <- mod$sample(
data = data_list,
seed = 123,
chains = 4,
parallel_chains = 4,
iter_sampling = 4000,
diagnostics = NULL
)
print(memuse::Sys.procmem())
start <- Sys.time()
diagnostics <- fit$diagnostic_summary() # this is the same thing as if you would run diagnostics after the run
print(Sys.time() - start)
print(memuse::Sys.procmem())
Those 16K draws finish in 0.1s on my machine (2 year old iMac Pro) when I fix the command, but I couldn't get the diagnostics thing to work to time it. Here's what I get:
> fit <- mod$sample(
+ data = data_list,
+ seed = 123,
+ chains = 4,
+ parallel_chains = 4,
+ iter_sampling = 4000,
+ diagnostics = NULL
+ )
Error in mod$sample(data = data_list, seed = 123, chains = 4, parallel_chains = 4, :
unused argument (diagnostics = NULL)
and
> diagnostics <- fit$diagnostic_summary() # this is the same thing as if you would run diagnostics after the run
Error: attempt to apply non-function
Ah, the argument was added recently with the new diagnostics.
Can you upgrade to the most recent development version by running:
remotes::install_github("stan-dev/cmdstanr")
Describe the bug
Ran sampling and then when it was done, got this:
and the value wasn't saved.
To Reproduce
Here's the R simulation code I was using.
And here's the Stan program, which goes in
kmers.stan
to run the script.Expected behavior
Output from sampling. No warning about error from ebfmi in terms not intended for a user.
Operating system
CmdStanR version number
0.5.2