joshspeagle / dynesty

Dynamic Nested Sampling package for computing Bayesian posteriors and evidences
https://dynesty.readthedocs.io/
MIT License
347 stars 76 forks source link

extremely large logzerr #360

Closed segasai closed 2 years ago

segasai commented 2 years ago

While rerunning the notebooks I noticed that for the exponential wave model we occasionally produce very large logzerr of 10^51.

This error has been there since at least d16e716e1791efab859349d0363aff9fd8af88fa (so it's not caused by the logz calculation refactorings)

The code is below

import sys
import numpy as np
import dynesty

rstate = np.random.default_rng(916301 + 2)
x = rstate.uniform(0, 2 * np.pi, size=100)
x.sort()

# define model
fa = 4.2 / 4
fb = 42 / 10
na = 0.8
nb = 0.3
pa = 0.1
pb = 2.4
sigma = 0.2

# generate noisy observations
ypred = np.exp(na * np.sin(x * fa + pa) + nb * np.sin(x * fb + pb))
y = rstate.normal(ypred, sigma)

def prior_transform(u):
    v = u * 100
    v[0] = u[0] * 4 - 2
    v[1] = u[1] * 4 - 2
    v[2] = (u[2] % 1.) * 2 * np.pi
    v[3] = u[3] * 4 - 2
    v[4] = u[4] * 4 - 2
    v[5] = (u[5] % 1.) * 2 * np.pi
    v[6] = u[6] * 2 - 2
    return v

def loglike(v):
    logna, logfa, pa, lognb, logfb, pb, logsigma = v
    na, fa, pa, nb, fb, pb, sigma = (10**logna, 10**logfa, pa, 10**lognb,
                                     10**logfb, pb, 10**logsigma)
    ypred = np.exp(na * np.sin(x * fa + pa) + nb * np.sin(x * fb + pb))
    residsq = (ypred - y)**2 / sigma**2
    loglike = -0.5 * np.sum(residsq + np.log(2 * np.pi * sigma**2))

    if not np.isfinite(loglike):
        loglike = -1e300

    return loglike

# sample
sampler = dynesty.NestedSampler(loglike,
                                prior_transform,
                                7,
                                periodic=[2, 5],
                                sample='rslice',
                                nlive=2000,
                                rstate=rstate)
sampler.run_nested(print_progress=True)
res = sampler.results

print(res.logz, res.logzerr[-1])

I'm investigating the cause of this, but I thought I'd create an issue to make sure it's not forgotten.

segasai commented 2 years ago

This 84ba36f3a698c267fdebccaecead8c83953a0533 provides a fix (the commit messsage is detailed enough). Basically in the static nested run the logz errors were not estimated using a full calculation of the quadratic estimator. (and since the logz errors depends on the H values which can only be computed with the full run (but not incrementally), the errors could be incorrect. This was the reason why previously, i.e. here https://github.com/joshspeagle/dynesty/blob/4c6a9ce67cb9e72d41edae2ed4994d879a2c946f/demos/Examples%20--%20LogGamma.ipynb the uncertainties on logz were not correct and had to be estimated using resample_run.

segasai commented 2 years ago

The issue was caused by the presence of -1e300 among log(likelihood), which broke the default incremental calculation. The commit 4352be9e1c27b9893ba48509aea1882565817108 applies the fix to the dynamic sampler, and the 8e00b304f8c3ddce4d155b3ad9b0daeaf505daf1 adds a test to prevent the error from reoccuring.