stan-dev / pystan

PyStan, a Python interface to Stan, a platform for statistical modeling. Documentation: https://pystan.readthedocs.io
ISC License
342 stars 59 forks source link

Error in formula for `num_samples_save` in `fit.py` #366

Closed ahartikainen closed 2 years ago

ahartikainen commented 2 years ago

Describe the bug

Currently defined formula is right only on even number of samples.

num_samples_saved = (self.num_samples + self.num_warmup * self.save_warmup) // self.num_thin

In stan C++ the samples are created with a for -loop in stan/services/util/generate_transitions.hpp for both num_warmup and num_samples as num_iterations

  for (int m = 0; m < num_iterations; ++m) {
    ...

    if (save && ((m % num_thin) == 0)) {
      ...
    }
  }

the correct formula is

from math import ceil
...
num_samples_saved = ceil(self.num_samples / self.num_thin) + ceil((self.num_warmup * self.save_warmup) / self.num_thin)

Steps/Code to Reproduce

import stan

schools_code = """
data {
  int<lower=0> J;         // number of schools
  real y[J];              // estimated treatment effects
  real<lower=0> sigma[J]; // standard error of effect estimates
}
parameters {
  real mu;                // population treatment effect
  real<lower=0> tau;      // standard deviation in treatment effects
  vector[J] eta;          // unscaled deviation from mu by school
}
transformed parameters {
  vector[J] theta = mu + tau * eta;        // school treatment effects
}
model {
  target += normal_lpdf(eta | 0, 1);       // prior log-density
  target += normal_lpdf(y | theta, sigma); // log-likelihood
}
"""

schools_data = {"J": 8,
                "y": [28,  8, -3,  7, -1,  1, 18, 12],
                "sigma": [15, 10, 16, 11,  9, 11, 10, 18]}

posterior = stan.build(schools_code, data=schools_data)
fit = posterior.sample(num_chains=1, num_samples=100, num_warmup=100, num_thin=3, save_warmup=True)
eta = fit["eta"]  # array with shape (8, 68) <- original formula gives (8, 66)  and raises IndexError