joshspeagle / dynesty

Dynamic Nested Sampling package for computing Bayesian posteriors and evidences
https://dynesty.readthedocs.io/
MIT License
347 stars 76 forks source link

resample_equal weigts tolerance problem #147

Closed mattpitkin closed 5 years ago

mattpitkin commented 5 years ago

I have situations where my unnormalised log weights (logwts) and logz (logz) values are large (e.g., around 2e8). When I normalise and exponentiate these, e.g., weights = np.exp(logwts - logz[-1]), and pass these to resample_weights, it fails with the:

raise ValueError("Weights do not sum to 1.")

because the abs(np.sum(weights) - 1. value (which is, in one case, 5.5e-8) is above the tolerance of SQRTEPS = 1.4901161193847656e-08.

I think this is a dynamic range issue due to my large log weights.

Would it be possible to increase the tolerance by a factor of 10 say?

joshspeagle commented 5 years ago

This should be easy to do. I'm getting around to all the issues this upcoming week, so I'll fix it up as I patch up the other stuff.

mattpitkin commented 5 years ago

Great, thanks very much

mattpitkin commented 5 years ago

Just to show the problem, I've mocked up an example that reproduces the problem:

from dynesty.utils import resample_equal
from dynesty import NestedSampler
from scipy.special import ndtri
import numpy as np

def prior_transform(theta):
    return (ndtri(theta[0]),)

# loop over different likelihood "normalisations"
for norm in [1e5, 1e6, 1e7, 1e8, 1e9]:
    def loglikelihood_dynesty(theta):
        return norm - 0.5*theta[0]**2

    sampler = NestedSampler(loglikelihood_dynesty, prior_transform, ndims,
                            bound=bound, sample=sample, nlive=nlive)
    sampler.run_nested(dlogz=0.1)
    res = sampler.results
    weights = np.exp(res['logwt'] - res['logz'][-1])

    print('Normalisation {}: {}'.format(norm, abs(np.sum(weights) - 1)))

which gives, for example:

iter: 195 | bound: 0 | nc: 2 | ncall: 749 | eff(%): 26.035 | loglstar:   -inf < 999999.530 <    inf | logz: 999997.831 +/-  0.071 | dlogz:  1.943 >  0.100                                            
Normalisation 100000.0: 3.153588501447757e-11
iter: 218 | bound: 0 | nc: 1 | ncall: 778 | eff(%): 28.021 | loglstar:   -inf < 9999999.618 <    inf | logz: 9999998.090 +/-  0.068 | dlogz:  1.687 >  0.100                                          
Normalisation 1000000.0: 9.982867865687695e-10
iter: 275 | bound: 0 | nc: 1 | ncall: 871 | eff(%): 31.573 | loglstar:   -inf < 99999999.713 <    inf | logz: 99999998.351 +/-    nan | dlogz:  1.392 >  0.100                                        
Normalisation 10000000.0: 7.1441901283719744e-09
iter: 233 | bound: 0 | nc: 3 | ncall: 804 | eff(%): 28.980 | loglstar:   -inf < 999999999.587 <    inf | logz: 999999998.094 +/-  0.643 | dlogz:  1.659 >  0.100                                      
Normalisation 100000000.0: 7.037656624131472e-08
iter: 1349 | +500 | bound: 0 | nc: 1 | ncall: 7503 | eff(%): 24.643 | loglstar:   -inf < 1000000000.000 <    inf | logz: 999999999.658 +/-    nan | dlogz:  0.000 >  0.100                            
Normalisation 1000000000.0: 6.879805974913111e-07

which shows that the weights are problematic when the log likelihoods are on the order of 1e8.

Changing SQRTEPS to be:

SQRTEPS = math.sqrt(float(np.finfo(np.float32).eps))

would work for log likelihoods up to about 1e11 (which I saw by extending the above example). I don't know if this is overkill though.