Closed mitzimorris closed 1 month ago
Name | Old Result | New Result | Ratio | Performance change( 1 - new / old ) |
---|---|---|---|---|
arma/arma.stan | 0.35 | 0.34 | 1.01 | 0.77% faster |
low_dim_corr_gauss/low_dim_corr_gauss.stan | 0.01 | 0.01 | 1.07 | 6.59% faster |
gp_regr/gen_gp_data.stan | 0.03 | 0.03 | 1.01 | 0.99% faster |
gp_regr/gp_regr.stan | 0.09 | 0.09 | 1.0 | -0.24% slower |
sir/sir.stan | 70.35 | 71.01 | 0.99 | -0.94% slower |
irt_2pl/irt_2pl.stan | 4.12 | 4.33 | 0.95 | -5.01% slower |
eight_schools/eight_schools.stan | 0.06 | 0.05 | 1.04 | 4.12% faster |
pkpd/sim_one_comp_mm_elim_abs.stan | 0.25 | 0.24 | 1.04 | 4.17% faster |
pkpd/one_comp_mm_elim_abs.stan | 19.49 | 18.79 | 1.04 | 3.59% faster |
garch/garch.stan | 0.44 | 0.4 | 1.09 | 7.89% faster |
low_dim_gauss_mix/low_dim_gauss_mix.stan | 2.7 | 2.58 | 1.05 | 4.51% faster |
arK/arK.stan | 1.78 | 1.71 | 1.05 | 4.36% faster |
gp_pois_regr/gp_pois_regr.stan | 2.83 | 2.69 | 1.05 | 4.99% faster |
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan | 8.9 | 8.35 | 1.07 | 6.18% faster |
performance.compilation | 183.57 | 180.29 | 1.02 | 1.79% faster |
Mean result: 1.0311739250234235
Jenkins Console Log Blue Ocean Commit hash: eb2b7af7527e4daaa5788c458dff4c118a19ee55
made test more strict, but differences between R and C++ are to be expected.
I am not convinced that they are expected (I recently wrote a javascript version of ESS and RHat that matches stan's current C++ out to ~10 digits). I would want to see some explanation of exactly where and why they do to this degree, if we're claiming to implement the same things.
But, if these differences can be justified, then we shouldn't be testing against those values at all. The presence of those values in the tests visually implies we do match the R values (when we don't), and the resulting tests have very little power to catch issues. For example:
I pick this example not just to be a pain, but because replacing the in-line standard deviation calculation in mcse_mean function is something a future code editor may be likely to do (if we don't already change it during review for this!), and accidentally picking the wrong library function is not hard to imagine, so failing the tests if that mistake gets made would be very helpful. Similar arguments can be made for basically all the other tests having tighter tolerances
But, if these differences can be justified, then we shouldn't be testing against those values at all.
I would be happy to find another way to test these. what would you suggest?
What previous implementations of diagnostics seem to do is just run the code once to get values and then "freeze" those in the test. I think #3312 deleted some tests in that style.
Doing this requires us to be pretty certain the implementation is correct, since we're making it the gold standard for all future versions of the code, which is why it would be important to understand what parts of the calculation are different than R
which is why it would be important to understand what parts of the calculation are different than R
perhaps we should create an artificial dataset to convince ourselves of the goodness of ESS and Rhat. what did you use for Javascript?
For testing I ended up using the same blocker1.csv and blocker2.csv as the stan tests, but for initially getting them to agree I was just using an arbitrary CSV file, I think from the Bernoulli model?
Name | Old Result | New Result | Ratio | Performance change( 1 - new / old ) |
---|---|---|---|---|
arma/arma.stan | 0.35 | 0.34 | 1.04 | 3.66% faster |
low_dim_corr_gauss/low_dim_corr_gauss.stan | 0.01 | 0.01 | 1.06 | 5.68% faster |
gp_regr/gen_gp_data.stan | 0.03 | 0.03 | 1.1 | 9.3% faster |
gp_regr/gp_regr.stan | 0.09 | 0.09 | 1.06 | 5.98% faster |
sir/sir.stan | 70.01 | 71.77 | 0.98 | -2.52% slower |
irt_2pl/irt_2pl.stan | 4.15 | 4.49 | 0.93 | -8.1% slower |
eight_schools/eight_schools.stan | 0.06 | 0.06 | 1.0 | -0.25% slower |
pkpd/sim_one_comp_mm_elim_abs.stan | 0.25 | 0.26 | 0.95 | -5.63% slower |
pkpd/one_comp_mm_elim_abs.stan | 19.4 | 19.46 | 1.0 | -0.28% slower |
garch/garch.stan | 0.44 | 0.44 | 1.0 | -0.08% slower |
low_dim_gauss_mix/low_dim_gauss_mix.stan | 2.72 | 2.71 | 1.01 | 0.51% faster |
arK/arK.stan | 1.8 | 1.8 | 1.0 | 0.13% faster |
gp_pois_regr/gp_pois_regr.stan | 2.88 | 3.07 | 0.94 | -6.68% slower |
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan | 8.83 | 8.62 | 1.02 | 2.31% faster |
performance.compilation | 182.25 | 178.41 | 1.02 | 2.11% faster |
Mean result: 1.0063032409056276
Jenkins Console Log Blue Ocean Commit hash: eb2b7af7527e4daaa5788c458dff4c118a19ee55
Name | Old Result | New Result | Ratio | Performance change( 1 - new / old ) |
---|---|---|---|---|
arma/arma.stan | 0.37 | 0.34 | 1.09 | 7.87% faster |
low_dim_corr_gauss/low_dim_corr_gauss.stan | 0.01 | 0.01 | 1.2 | 16.62% faster |
gp_regr/gen_gp_data.stan | 0.03 | 0.02 | 1.14 | 12.06% faster |
gp_regr/gp_regr.stan | 0.1 | 0.09 | 1.07 | 6.87% faster |
sir/sir.stan | 70.05 | 72.58 | 0.97 | -3.62% slower |
irt_2pl/irt_2pl.stan | 4.15 | 4.02 | 1.03 | 3.22% faster |
eight_schools/eight_schools.stan | 0.06 | 0.06 | 1.02 | 2.07% faster |
pkpd/sim_one_comp_mm_elim_abs.stan | 0.25 | 0.25 | 1.02 | 1.55% faster |
pkpd/one_comp_mm_elim_abs.stan | 19.49 | 19.54 | 1.0 | -0.29% slower |
garch/garch.stan | 0.44 | 0.41 | 1.07 | 6.27% faster |
low_dim_gauss_mix/low_dim_gauss_mix.stan | 2.7 | 2.72 | 0.99 | -0.81% slower |
arK/arK.stan | 1.8 | 1.71 | 1.05 | 4.91% faster |
gp_pois_regr/gp_pois_regr.stan | 2.85 | 2.68 | 1.06 | 5.97% faster |
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan | 8.78 | 8.37 | 1.05 | 4.72% faster |
performance.compilation | 178.22 | 179.53 | 0.99 | -0.73% slower |
Mean result: 1.0496172795519458
Jenkins Console Log Blue Ocean Commit hash: eb2b7af7527e4daaa5788c458dff4c118a19ee55
Name | Old Result | New Result | Ratio | Performance change( 1 - new / old ) |
---|---|---|---|---|
arma/arma.stan | 0.37 | 0.35 | 1.06 | 5.3% faster |
low_dim_corr_gauss/low_dim_corr_gauss.stan | 0.01 | 0.01 | 0.96 | -4.22% slower |
gp_regr/gen_gp_data.stan | 0.03 | 0.03 | 1.0 | -0.46% slower |
gp_regr/gp_regr.stan | 0.09 | 0.1 | 0.99 | -0.88% slower |
sir/sir.stan | 70.45 | 67.98 | 1.04 | 3.5% faster |
irt_2pl/irt_2pl.stan | 4.13 | 3.96 | 1.04 | 4.06% faster |
eight_schools/eight_schools.stan | 0.06 | 0.05 | 1.04 | 3.44% faster |
pkpd/sim_one_comp_mm_elim_abs.stan | 0.25 | 0.24 | 1.04 | 4.27% faster |
pkpd/one_comp_mm_elim_abs.stan | 19.38 | 18.6 | 1.04 | 4.02% faster |
garch/garch.stan | 0.44 | 0.41 | 1.07 | 6.77% faster |
low_dim_gauss_mix/low_dim_gauss_mix.stan | 2.73 | 2.56 | 1.07 | 6.2% faster |
arK/arK.stan | 1.77 | 1.7 | 1.04 | 4.02% faster |
gp_pois_regr/gp_pois_regr.stan | 2.77 | 2.66 | 1.04 | 3.71% faster |
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan | 8.69 | 8.35 | 1.04 | 3.92% faster |
performance.compilation | 180.6 | 184.2 | 0.98 | -1.99% slower |
Mean result: 1.0295481519172558
Jenkins Console Log Blue Ocean Commit hash: eb2b7af7527e4daaa5788c458dff4c118a19ee55
Name | Old Result | New Result | Ratio | Performance change( 1 - new / old ) |
---|---|---|---|---|
arma/arma.stan | 0.35 | 0.36 | 0.99 | -1.51% slower |
low_dim_corr_gauss/low_dim_corr_gauss.stan | 0.01 | 0.01 | 0.94 | -6.11% slower |
gp_regr/gen_gp_data.stan | 0.03 | 0.03 | 1.02 | 2.22% faster |
gp_regr/gp_regr.stan | 0.1 | 0.09 | 1.04 | 4.03% faster |
sir/sir.stan | 70.72 | 68.48 | 1.03 | 3.16% faster |
irt_2pl/irt_2pl.stan | 4.19 | 3.97 | 1.05 | 5.19% faster |
eight_schools/eight_schools.stan | 0.06 | 0.06 | 1.05 | 4.53% faster |
pkpd/sim_one_comp_mm_elim_abs.stan | 0.25 | 0.24 | 1.04 | 3.56% faster |
pkpd/one_comp_mm_elim_abs.stan | 19.43 | 18.85 | 1.03 | 2.99% faster |
garch/garch.stan | 0.45 | 0.41 | 1.12 | 10.49% faster |
low_dim_gauss_mix/low_dim_gauss_mix.stan | 2.71 | 2.65 | 1.02 | 2.26% faster |
arK/arK.stan | 1.79 | 1.7 | 1.05 | 5.16% faster |
gp_pois_regr/gp_pois_regr.stan | 2.8 | 2.69 | 1.04 | 3.75% faster |
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan | 8.83 | 8.37 | 1.06 | 5.29% faster |
performance.compilation | 180.41 | 179.96 | 1.0 | 0.25% faster |
Mean result: 1.0324525440756125
Jenkins Console Log Blue Ocean Commit hash: eb2b7af7527e4daaa5788c458dff4c118a19ee55
Name | Old Result | New Result | Ratio | Performance change( 1 - new / old ) |
---|---|---|---|---|
arma/arma.stan | 0.33 | 0.38 | 0.87 | -14.98% slower |
low_dim_corr_gauss/low_dim_corr_gauss.stan | 0.01 | 0.01 | 0.98 | -1.95% slower |
gp_regr/gen_gp_data.stan | 0.03 | 0.03 | 0.99 | -0.64% slower |
gp_regr/gp_regr.stan | 0.09 | 0.09 | 1.02 | 2.21% faster |
sir/sir.stan | 69.87 | 68.0 | 1.03 | 2.67% faster |
irt_2pl/irt_2pl.stan | 4.23 | 3.95 | 1.07 | 6.67% faster |
eight_schools/eight_schools.stan | 0.06 | 0.05 | 1.05 | 4.44% faster |
pkpd/sim_one_comp_mm_elim_abs.stan | 0.25 | 0.24 | 1.06 | 5.22% faster |
pkpd/one_comp_mm_elim_abs.stan | 19.37 | 19.33 | 1.0 | 0.2% faster |
garch/garch.stan | 0.42 | 0.41 | 1.03 | 3.2% faster |
low_dim_gauss_mix/low_dim_gauss_mix.stan | 2.67 | 2.57 | 1.04 | 4.05% faster |
arK/arK.stan | 1.77 | 1.71 | 1.04 | 3.71% faster |
gp_pois_regr/gp_pois_regr.stan | 2.79 | 2.67 | 1.04 | 4.1% faster |
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan | 8.68 | 8.32 | 1.04 | 4.16% faster |
performance.compilation | 180.54 | 179.95 | 1.0 | 0.33% faster |
Mean result: 1.018155431705192
Jenkins Console Log Blue Ocean Commit hash: eb2b7af7527e4daaa5788c458dff4c118a19ee55
@WardBrian this is ready for re-review.
the discrepencies between this code and the implementations in posterior package were due to:
stan::math::covariance
and stan::analyze::covariance
. now using the latter.bernoulli.stan
model everywhere.changes per discussion here and in https://github.com/stan-dev/stan/pull/3312#issuecomment-2420180157
src/analyze/mcmc
into file ess_basic
and rhat_basic
, with corresponding unit testsess_basic
and rhat_basic
tests against both posterior pkg and old implementations in src/analyze/mcmc/compute*.hpp
chainset_test.cpp
to unit tests for src/analyze/mcmc
.at some point we should deprecate chains.hpp
and corresponging src/analyze/mcmc/compute*.hpp
.
Name | Old Result | New Result | Ratio | Performance change( 1 - new / old ) |
---|---|---|---|---|
arma/arma.stan | 0.35 | 0.34 | 1.02 | 1.75% faster |
low_dim_corr_gauss/low_dim_corr_gauss.stan | 0.01 | 0.01 | 0.96 | -4.0% slower |
gp_regr/gen_gp_data.stan | 0.03 | 0.03 | 1.0 | -0.43% slower |
gp_regr/gp_regr.stan | 0.09 | 0.09 | 1.0 | 0.24% faster |
sir/sir.stan | 69.97 | 68.96 | 1.01 | 1.45% faster |
irt_2pl/irt_2pl.stan | 4.14 | 4.21 | 0.98 | -1.68% slower |
eight_schools/eight_schools.stan | 0.06 | 0.06 | 1.01 | 1.14% faster |
pkpd/sim_one_comp_mm_elim_abs.stan | 0.25 | 0.25 | 1.01 | 1.35% faster |
pkpd/one_comp_mm_elim_abs.stan | 19.47 | 19.22 | 1.01 | 1.31% faster |
garch/garch.stan | 0.43 | 0.41 | 1.06 | 6.0% faster |
low_dim_gauss_mix/low_dim_gauss_mix.stan | 2.7 | 2.59 | 1.05 | 4.33% faster |
arK/arK.stan | 1.83 | 1.7 | 1.07 | 6.74% faster |
gp_pois_regr/gp_pois_regr.stan | 2.9 | 2.68 | 1.08 | 7.63% faster |
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan | 8.79 | 8.37 | 1.05 | 4.77% faster |
performance.compilation | 182.31 | 177.81 | 1.03 | 2.47% faster |
Mean result: 1.023584481152946
Jenkins Console Log Blue Ocean Commit hash: eb2b7af7527e4daaa5788c458dff4c118a19ee55
Name | Old Result | New Result | Ratio | Performance change( 1 - new / old ) |
---|---|---|---|---|
arma/arma.stan | 0.37 | 0.35 | 1.04 | 3.89% faster |
low_dim_corr_gauss/low_dim_corr_gauss.stan | 0.01 | 0.01 | 1.04 | 3.64% faster |
gp_regr/gen_gp_data.stan | 0.03 | 0.03 | 1.19 | 15.76% faster |
gp_regr/gp_regr.stan | 0.1 | 0.09 | 1.03 | 3.31% faster |
sir/sir.stan | 69.95 | 69.81 | 1.0 | 0.2% faster |
irt_2pl/irt_2pl.stan | 4.07 | 4.02 | 1.01 | 1.19% faster |
eight_schools/eight_schools.stan | 0.06 | 0.06 | 1.02 | 2.21% faster |
pkpd/sim_one_comp_mm_elim_abs.stan | 0.25 | 0.25 | 0.99 | -1.01% slower |
pkpd/one_comp_mm_elim_abs.stan | 19.15 | 18.84 | 1.02 | 1.63% faster |
garch/garch.stan | 0.45 | 0.41 | 1.1 | 8.81% faster |
low_dim_gauss_mix/low_dim_gauss_mix.stan | 2.78 | 2.6 | 1.07 | 6.59% faster |
arK/arK.stan | 1.81 | 2.58 | 0.7 | -42.36% slower |
gp_pois_regr/gp_pois_regr.stan | 2.83 | 2.71 | 1.04 | 4.07% faster |
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan | 8.85 | 8.43 | 1.05 | 4.73% faster |
performance.compilation | 178.71 | 183.63 | 0.97 | -2.75% slower |
Mean result: 1.0185136446700325
Jenkins Console Log Blue Ocean Commit hash: eb2b7af7527e4daaa5788c458dff4c118a19ee55
Name | Old Result | New Result | Ratio | Performance change( 1 - new / old ) |
---|---|---|---|---|
arma/arma.stan | 0.36 | 0.36 | 0.99 | -0.67% slower |
low_dim_corr_gauss/low_dim_corr_gauss.stan | 0.01 | 0.01 | 0.94 | -6.54% slower |
gp_regr/gen_gp_data.stan | 0.03 | 0.03 | 1.0 | 0.42% faster |
gp_regr/gp_regr.stan | 0.1 | 0.09 | 1.03 | 3.21% faster |
sir/sir.stan | 70.33 | 72.61 | 0.97 | -3.24% slower |
irt_2pl/irt_2pl.stan | 4.17 | 4.28 | 0.97 | -2.74% slower |
eight_schools/eight_schools.stan | 0.06 | 0.06 | 0.98 | -2.4% slower |
pkpd/sim_one_comp_mm_elim_abs.stan | 0.26 | 0.25 | 1.06 | 5.97% faster |
pkpd/one_comp_mm_elim_abs.stan | 19.49 | 19.02 | 1.02 | 2.42% faster |
garch/garch.stan | 0.44 | 0.41 | 1.07 | 6.21% faster |
low_dim_gauss_mix/low_dim_gauss_mix.stan | 2.7 | 2.57 | 1.05 | 4.66% faster |
arK/arK.stan | 1.79 | 1.7 | 1.05 | 4.93% faster |
gp_pois_regr/gp_pois_regr.stan | 2.82 | 2.67 | 1.06 | 5.33% faster |
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan | 8.78 | 8.44 | 1.04 | 3.83% faster |
performance.compilation | 178.74 | 177.93 | 1.0 | 0.45% faster |
Mean result: 1.0162582955033252
Jenkins Console Log Blue Ocean Commit hash: eb2b7af7527e4daaa5788c458dff4c118a19ee55
Name | Old Result | New Result | Ratio | Performance change( 1 - new / old ) |
---|---|---|---|---|
arma/arma.stan | 0.43 | 0.38 | 1.13 | 11.79% faster |
low_dim_corr_gauss/low_dim_corr_gauss.stan | 0.01 | 0.01 | 1.11 | 9.84% faster |
gp_regr/gen_gp_data.stan | 0.03 | 0.03 | 1.06 | 5.72% faster |
gp_regr/gp_regr.stan | 0.11 | 0.1 | 1.15 | 12.69% faster |
sir/sir.stan | 80.7 | 74.12 | 1.09 | 8.16% faster |
irt_2pl/irt_2pl.stan | 5.11 | 4.5 | 1.13 | 11.89% faster |
eight_schools/eight_schools.stan | 0.06 | 0.06 | 1.07 | 6.12% faster |
pkpd/sim_one_comp_mm_elim_abs.stan | 0.27 | 0.26 | 1.03 | 2.9% faster |
pkpd/one_comp_mm_elim_abs.stan | 22.08 | 20.37 | 1.08 | 7.74% faster |
garch/garch.stan | 0.65 | 0.46 | 1.42 | 29.71% faster |
low_dim_gauss_mix/low_dim_gauss_mix.stan | 2.99 | 2.81 | 1.06 | 6.04% faster |
arK/arK.stan | 2.0 | 1.85 | 1.08 | 7.82% faster |
gp_pois_regr/gp_pois_regr.stan | 3.44 | 2.94 | 1.17 | 14.51% faster |
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan | 9.72 | 9.39 | 1.04 | 3.45% faster |
performance.compilation | 199.44 | 204.54 | 0.98 | -2.55% slower |
Mean result: 1.1069147395963945
Jenkins Console Log Blue Ocean Commit hash: eb2b7af7527e4daaa5788c458dff4c118a19ee55
The unit tests now test basic ESS and Rhat against the values computed by CmdStan 2.35.0's bin/stansummary and the saved summary CSV file has been checked into the folder src/test/unit/analyze/mcmc/test_csv_files.
I have added unit tests for rank-normalization and splitting the chains - the latter did uncover a bug where splitting a chain with an odd number of draws would omit the last draw, not the (N+1)/2 draw; fixed - but this bug would not have been tickled by any of the existing tests since all CSV test sets had and even number of draws.
Name | Old Result | New Result | Ratio | Performance change( 1 - new / old ) |
---|---|---|---|---|
arma/arma.stan | 0.37 | 0.38 | 0.98 | -1.81% slower |
low_dim_corr_gauss/low_dim_corr_gauss.stan | 0.01 | 0.01 | 0.71 | -41.17% slower |
gp_regr/gen_gp_data.stan | 0.03 | 0.03 | 1.1 | 8.85% faster |
gp_regr/gp_regr.stan | 0.1 | 0.1 | 0.94 | -5.86% slower |
sir/sir.stan | 76.62 | 74.8 | 1.02 | 2.37% faster |
irt_2pl/irt_2pl.stan | 4.31 | 4.17 | 1.03 | 3.27% faster |
eight_schools/eight_schools.stan | 0.06 | 0.06 | 0.97 | -2.85% slower |
pkpd/sim_one_comp_mm_elim_abs.stan | 0.26 | 0.26 | 0.98 | -1.88% slower |
pkpd/one_comp_mm_elim_abs.stan | 19.74 | 20.48 | 0.96 | -3.78% slower |
garch/garch.stan | 0.46 | 0.44 | 1.05 | 5.17% faster |
low_dim_gauss_mix/low_dim_gauss_mix.stan | 2.82 | 2.76 | 1.02 | 1.96% faster |
arK/arK.stan | 1.8 | 1.84 | 0.98 | -1.86% slower |
gp_pois_regr/gp_pois_regr.stan | 2.84 | 2.89 | 0.98 | -1.7% slower |
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan | 9.06 | 9.04 | 1.0 | 0.13% faster |
performance.compilation | 187.23 | 194.85 | 0.96 | -4.07% slower |
Mean result: 0.9806370865532955
Jenkins Console Log Blue Ocean Commit hash: eb2b7af7527e4daaa5788c458dff4c118a19ee55
Submission Checklist
./runTests.py src/test/unit
make cpplint
Summary
This is a do-over of PR https://github.com/stan-dev/stan/pull/3310. The overall goal is to make the improved R-hat diagnostics (https://github.com/stan-dev/stan/pull/3266), and rank-normalized ESS bulk and tail, (https://github.com/stan-dev/stan/pull/3312) available to CmdStan, following the definitions in https://arxiv.org/abs/1903.08008.
This PR adds a new class
stan::mcmc::chainset
. Achainset
object requires that all chains are the same size on construction, unlike thechains<>
object. This simplifies computation of split-Rhat and split-ESS diagnostics. Thechainset
object provides rank-normalized split Rhat, rank-normalized split ESS which returns bulk and tail ESS, and functionsmcse
andmcse_sd
which calculate the mean Monte Carlo standard error and its variance.How to Verify
Unit tests for the chainset test performance against Stan CSV outputs. The CmdStan branch https://github.com/stan-dev/cmdstan/tree/feature/1263-new-rhat-summary further checks that
chainset
provides the necessary functions to output the same set of summary statistics as the R interfaces.Side Effects
N/A
Documentation
Documentation will be added to the Stan docset in a separate PR.
Copyright and Licensing
Please list the copyright holder for the work you are submitting (this will be you or your assignee, such as a university or company): Columbia University
By submitting this pull request, the copyright holder is agreeing to license the submitted work under the following licenses: