stan-dev / stan

Stan development repository. The master branch contains the current release. The develop branch contains the latest stable development. See the Developer Process Wiki for details.
https://mc-stan.org
BSD 3-Clause "New" or "Revised" License
2.55k stars 365 forks source link

Too loose ASSERT_NEAR and wrong test values in compute_potential_scale_reduction_test.cpp #3273

Open aleksgorica opened 4 months ago

aleksgorica commented 4 months ago

Description:

I have tried reducing the accepted error in ASSERT_NEAR from 1 to 1e-4, but then the tests failed for ComputeRhat.compute_potential_scale_reduction and ComputeRhat.compute_potential_scale_reduction_convenience. Upon inspection, the test values are the same as for compute_split_potential_scale_reduction and compute_split_potential_scale_reduction_convenience, for which the corresponding functions output different values.

Current Version:

v2.34.1

bob-carpenter commented 4 months ago

Any suggestion on how we get values with which to test? Are there tests in the R or Python packages that compute these quantities?

aleksgorica commented 4 months ago

I used Arviz to generate test values for split-rank, no-split-rank, split-no-rank, no-split-no-rank. Here is the code I used:


iimport arviz
import numpy as np
from arviz.stats.diagnostics import _rhat, _backtransform_ranks, _z_scale, _split_chains, _rhat_rank
from arviz.stats.stats_utils import wrap_xarray_ufunc

posterior_glob = "blocker.*.csv"
cmdstan_data = arviz.from_cmdstan(posterior=posterior_glob)

def _rhat_rank_no_split(ary):

    ary = np.asarray(ary)
    rhat_bulk = _rhat(_z_scale(ary))

    split_ary_folded = abs(ary - np.median(ary))
    rhat_tail = _rhat(_z_scale(split_ary_folded))

    rhat_rank = max(rhat_bulk, rhat_tail)
    return rhat_rank

rhat_no_split_rank = wrap_xarray_ufunc(
        _rhat_rank_no_split,
        cmdstan_data.posterior,
        ufunc_kwargs={"ravel": False},
        func_kwargs={},
        dask_kwargs=None
    )
rhat_split_rank = arviz.rhat(cmdstan_data.posterior)
rhat_split_no_rank = arviz.rhat(cmdstan_data.posterior, method="split")
rhat_no_split_no_rank = arviz.rhat(cmdstan_data.posterior, method="identity")

def print_rhat(rhat_results):
    rhat_values = {var_name: rhat_results[var_name].values for var_name in rhat_results.data_vars}
    all_rhat_values = np.concatenate([values.ravel() for values in rhat_values.values()])
    print(",".join([str(el) for el in np.round(all_rhat_values, decimals=5)]))
print_rhat(rhat_split_rank)
print_rhat(rhat_split_no_rank)
print_rhat(rhat_no_split_no_rank)
print_rhat(rhat_no_split_rank)
bob-carpenter commented 4 months ago

Great. I'm curious to see whether tests pass. Getting the details on this kind of thing exactly matching can be painful.