Closed bob-carpenter closed 1 month ago
Investigating now, I suspect that neither is actually "incorrect" but the difference lies in how ties are broken. The doc for method
on np.quantile
is pretty dense
My instinct was right, but it is still a bit odd.
The Stan behavior changes at p=0.5, so no one argument to method
will work, but this does:
import cmdstanpy as csp
import numpy as np
import logging
csp.utils.get_logger().setLevel(logging.ERROR)
def stan_like_quantiles(a, q, axis=None):
out = []
for p in q:
if p < 0.5:
out.append(np.quantile(a, p, axis=axis, method='lower'))
else:
out.append(np.quantile(a, p, axis=axis, method='nearest'))
return np.array(out)
model = csp.CmdStanModel(stan_file='funnel.stan')
init = {'double_log_scale': 0, 'alpha': np.zeros(9)}
mass_matrix = {'inv_metric': np.ones(10)}
epsilon = 0.0025
print(f"\n\n epsilon={epsilon:6.4f}")
fit = model.sample(inits=init, chains=1, step_size=epsilon, iter_warmup=0, adapt_engaged=False, iter_sampling=2_000, metric=mass_matrix, show_progress=False)
print(fit.summary(percentiles=(2.5, 50, 97.5)))
metadata_param_draws = fit.draws(concat_chains=True)
print(f"{metadata_param_draws.shape=}")
draws = metadata_param_draws[:, 7:]
print(f"{draws.shape=}")
quantiles = stan_like_quantiles(draws, [0.025,0.5,0.975], axis=0)
for n in range(10):
print(f" theta[{n}] : ({quantiles[0, n]:7.5f}, {quantiles[1, n]:7.5f} {quantiles[2, n]:7.5f})")
prints
Mean MCSE StdDev 2.5% 50% 97.5% N_Eff N_Eff/s R_hat
lp__ 1.845180 2.485300 10.53630 -17.95740 2.236310 21.90610 17.9728 10.3949 1.002140
double_log_scale -1.488500 0.567439 2.41004 -6.39507 -1.487180 3.04691 18.0390 10.4332 1.002080
alpha[1] -0.122363 0.063339 1.10582 -3.16798 -0.016352 2.04890 304.8030 176.2890 1.004150
alpha[2] 0.076506 0.178563 1.63905 -3.63104 0.005829 4.66082 84.2563 48.7312 1.005280
alpha[3] 0.039439 0.115609 1.37654 -2.92992 0.006639 3.48889 141.7750 81.9981 1.006530
alpha[4] 0.161731 0.145244 1.50736 -2.23265 0.007861 4.41500 107.7050 62.2930 1.013910
alpha[5] -0.017582 0.079393 1.21976 -2.73845 0.007058 2.45055 236.0390 136.5180 1.018140
alpha[6] -0.090614 0.126625 1.44340 -4.65026 -0.001944 2.47281 129.9370 75.1514 0.999589
alpha[7] -0.187405 0.121376 1.42942 -3.91546 -0.010271 2.35533 138.6920 80.2152 1.018600
alpha[8] 0.167995 0.132302 1.47003 -2.54510 0.009134 4.60312 123.4580 71.4042 1.006310
alpha[9] -0.073094 0.069720 1.12135 -2.79059 -0.006392 2.29814 258.6800 149.6130 1.003620
metadata_param_draws.shape=(2000, 17)
draws.shape=(2000, 10)
theta[0] : (-6.39507, -1.48718 3.04691)
theta[1] : (-3.16798, -0.01635 2.04890)
theta[2] : (-3.63104, 0.00583 4.66082)
theta[3] : (-2.92992, 0.00664 3.48889)
theta[4] : (-2.23265, 0.00786 4.41500)
theta[5] : (-2.73845, 0.00706 2.45055)
theta[6] : (-4.65026, -0.00194 2.47281)
theta[7] : (-3.91546, -0.01027 2.35533)
theta[8] : (-2.54510, 0.00913 4.60312)
theta[9] : (-2.79059, -0.00639 2.29814)
I think if the above behavior is something we should change the right place to open an issue is Stan. It appears that code has been essentially unchanged since 2013 at this point
Summary:
I fit a model and calculated quantiles using CmdStan (through cmdstanpy) and using Python. They give close, but different answers. I suspect the quantiles are broken in CmdStan, but I thought I'd file here first and then the issue can be moved to CmdStan if necessary.
Description:
See above. Here's a minimal-ish working example, which I put in
sim.py
:Here's the Stan program in
funnel.stan
:And here's what I get:
You can see that the quantiles are close, but not quite spot on. At first I thought this might be rounding of the 2.5 and 97.5, but specifying 2.5 also doesn't match the 0.02 or 0.03 quantiles.
Current Version: