stan-dev / stan

Stan development repository. The master branch contains the current release. The develop branch contains the latest stable development. See the Developer Process Wiki for details.
https://mc-stan.org
BSD 3-Clause "New" or "Revised" License
2.6k stars 370 forks source link

stan_csv_reader should skip warmup draws #3301

Closed mitzimorris closed 1 month ago

mitzimorris commented 4 months ago

Summary:

As currently written, when stan::io::stan_csv_reader encounters a file with saved warmup draws, the parser will return both warmup and sample draws, but no adaptation information.

Instead, the reader should check the parsed metatdata to determine whether or not there are saved warmup draws. If there are, these can be discarded, as the consumer of the resulting stan_csv object - stan::mcmc::chains doesn't need them.

Description:

The parser expects to find the following element in the following order:

Function read_adaptation is called after the header line has been consumed. If the following line is not a comment block, it returns. An error is written to the supplied stream out and the parser then calls read_samples. Read samples then reads both the warmup and post-warmup draws.

The returned object contains warmup and post-warmup draws, but no information about step-size or metric.

Reproducible Steps:

  1. Get outputs from 2 runs of a Stan model, w/ and w/out save_wamup=true.
  2. Parse both with stan_csv_reader, examine the returned stan_csv object, e.g.:
TEST_F(McmcChains, stan_csv) {
  std::cout << "bernoulli_default " << bernoulli_default.metadata.num_samples;
  std::cout << " sample size " << bernoulli_default.samples.rows();
  std::cout << " step size " << bernoulli_default.adaptation.step_size << std::endl;
  std::cout << " bernoulli_warmup " << bernoulli_warmup.metadata.num_samples;
  std::cout << " sample size " << bernoulli_warmup.samples.rows();
  std::cout << " step size " << bernoulli_warmup.adaptation.step_size << std::endl;
}

Current Output:

bernoulli_default 1000 sample size 1000 step size 0.932037 bernoulli_warmup 1000 sample size 2000 step size 0

Expected Output:

Both should have sample size 1000, and step size close to 0.93.

Additional Information:

Provide any additional information here.

Current Version:

v2.35.0

mitzimorris commented 1 month ago

additional problems found with stan_csv_reader.hpp - extend to properly parse ADVI samples - per CmdStan https://github.com/stan-dev/cmdstan/pull/972