add test for rstate - Githubissues

avivajpeyi commented 2 years ago

Hey @joshspeagle and @segasai ,

Some devs and I at Monash Uni are having some trouble getting reproducible results with the same rstate. This PR includes a small test to demonstrate this issue, and hopefully check for it in the future.

Here is also a script that plots N sampling results from the linear regression example, both using the same data and same sampling seed.

from __future__ import division, print_function
from six.moves import range
from functools import partial

import numpy as np

import matplotlib
from matplotlib import pyplot as plt
from matplotlib import rcParams

import dynesty
from dynesty import plotting as dyplot

TRUTHS = dict(
    m = -0.9594,
    b = 4.294,
    f = 0.534
)

def set_rcparams():
  rcParams.update({'xtick.major.pad': '7.0'})
  rcParams.update({'xtick.major.size': '7.5'})
  rcParams.update({'xtick.major.width': '1.5'})
  rcParams.update({'xtick.minor.pad': '7.0'})
  rcParams.update({'xtick.minor.size': '3.5'})
  rcParams.update({'xtick.minor.width': '1.0'})
  rcParams.update({'ytick.major.pad': '7.0'})
  rcParams.update({'ytick.major.size': '7.5'})
  rcParams.update({'ytick.major.width': '1.5'})
  rcParams.update({'ytick.minor.pad': '7.0'})
  rcParams.update({'ytick.minor.size': '3.5'})
  rcParams.update({'ytick.minor.width': '1.0'})
  rcParams.update({'font.size': 30})

def line_eq(m, x, b):
    return  m * x + b

def generate_data(theta, num = 50, generation_seed=0):
    np.random.seed(generation_seed)
    x = np.sort(10 * np.random.rand(num))
    yerr = 0.1 + 0.5 * np.random.rand(num)
    y_true = line_eq(theta['m'], x, theta['b'])
    y = y_true + np.abs(theta['f'] * y_true) * np.random.randn(num)
    y += yerr * np.random.randn(num)
    return x, y, yerr

# log-likelihood
def loglike_given_data(theta, x, y, yerr):
    m, b, lnf = theta
    model = line_eq(m, x, b)
    inv_sigma2 = 1.0 / (yerr**2 + model**2 * np.exp(2 * lnf))
    return -0.5 * (np.sum((y-model)**2 * inv_sigma2 - np.log(inv_sigma2)))

# prior transform
def prior_transform(utheta):
    um, ub, ulf = utheta
    m = 5.5 * um - 5.
    b = 10. * ub
    lnf = 11. * ulf - 10.
    return m, b, lnf

def run_sampler(x, y, yerr, rstate):
  loglike = partial(loglike_given_data, x=x, y=y, yerr=yerr)
  sampler = dynesty.NestedSampler(
      loglike, prior_transform, ndim=3,
      bound='multi', sample='rwalk', rstate=rstate
  )
  sampler.run_nested()
  return sampler.results

def repeat_sampler_run_with_fixed_seed(nruns=2, seed=0):
  x, y, yerr = generate_data(TRUTHS)
  results = []
  for i in range(nruns):
    print(f">>> Run {i}:")
    result = run_sampler(x, y, yerr, rstate=np.random.RandomState(seed=seed))
    results.append(result)
    print("------------")
  return results

def plot_results(results, fname="results.png"):
  set_rcparams()
  fig, axes = plt.subplots(4, 1, figsize=(12, 18))
  for i, r in enumerate(results):
      fig, axes = dyplot.runplot(r, color=f"C{i}", fig=(fig, axes))
  fig.tight_layout()
  fig.savefig(fname)

def main():
  results = repeat_sampler_run_with_fixed_seed()
  plot_results(results)

if __name__ == "__main__":
  main()

Note that the runplots for the two runs look rather different:

results

Is this a known feature? How can I set my rstate such that I can reproduce a run identically?

segasai commented 2 years ago

Hi @avivajpeyi ,

This and many other things have been fixed here https://github.com/joshspeagle/dynesty/issues/254

Relevant commits are here https://github.com/joshspeagle/dynesty/pull/292

You just need to use the new style numpy random generators. (i.e. np.random.default_rng(xx))

I'm pretty certain that now dynesty is 100% deterministic.

avivajpeyi commented 2 years ago

Hmmm changing np.random.RandomState(seed=0) to np.random.default_rng(seed=0) results in some errors:

            for attempt in range(100):
>               live_u = rstate.rand(nlive, npdim)  # positions in unit cube
E               AttributeError: 'numpy.random._generator.Generator' object has no attribute 'rand'

However, running the above script (of the linear regression example) does indeed provide reproducible results!

results ,

segasai commented 2 years ago

You have to use dynesty master branch (as 1.2 is still not released... )

segasai commented 2 years ago

I think, you'd have to test that the answer are EXACTLY equal, not approximately. And given that, there is no need imo for more than two runs. Also the problem with this kind of test (and the reason i didn't commit something like that before) is that there are many different code paths that involve randomness (different samplers etc) and it's a bit too time-consuming to test them all.

In fact you test is incorrect. you need to have get_rstate(SOME_NUMBER) inside the loop.

segasai commented 2 years ago

I've added a test that verifies that all the results attributes are the exactly the same when run with the same rstate https://github.com/joshspeagle/dynesty/commit/162d66813f85b830059f1722359364d5598b6b50 Thanks for the PR @avivajpeyi

joshspeagle / dynesty

add test for rstate #353