joshspeagle / dynesty

Dynamic Nested Sampling package for computing Bayesian posteriors and evidences
https://dynesty.readthedocs.io/
MIT License
346 stars 76 forks source link

Parallel uniform sampling #471

Open segasai opened 6 months ago

segasai commented 6 months ago

This is a preliminary version of patch that enables parallel uniform sampling. The idea is to get away from this propose/evolve for uniform distributions where evolve doesn't really do anything other than evaluate logl on a single pt, but instead make propose a no-op for uniform, and evolve() actually do the sampling inside the boundary till a satisfactory pt is found.

While implementing this I realized that after some changes all the different samplers in nestedsamplers.py may not need to be there as they do something very similar (but that's tbd).

ATM some tests fail, but I think that's fixable.

Also some class members had to be renamed. I.e. .bound attribute is really misleading in the classes as it was storing the list of all the historic bounds, so I renamed that to bound_list.

coveralls commented 6 months ago

Pull Request Test Coverage Report for Build 7952108689

Details


Changes Missing Coverage Covered Lines Changed/Added Lines %
py/dynesty/sampling.py 28 30 93.33%
py/dynesty/bounding.py 30 34 88.24%
py/dynesty/sampler.py 136 142 95.77%
<!-- Total: 213 225 94.67% -->
Files with Coverage Reduction New Missed Lines %
py/dynesty/bounding.py 1 92.24%
<!-- Total: 1 -->
Totals Coverage Status
Change from base Build 7180723466: 0.2%
Covered Lines: 4053
Relevant Lines: 4412

💛 - Coveralls
joshspeagle commented 6 months ago

!!!! This is an excellent proposed change! Very interested to see how the other internal refactoring might also work.

segasai commented 6 months ago

Thanks @joshspeagle Just out of curiosity i've actually refactored nestedsamplers. It was surprisingly easy to get rid of those classes there which removed few hundred lines (of codes + comments). I am not 100%, but 95% certain that's a good change just because it's easier to understand it now (and there was so little different between those samplers)

segasai commented 6 months ago

The demonstration of this patch. The following uniform sampling of Eggbox with single bound takes 36 minutes in the current dynesty : 9528it [36:29, 4.35it/s, +1000 | bound: 8600 | nc: 1 | ncall: 12936194 | eff(%): 0.081 | loglstar: -inf < 243.000 < inf | logz: 235.824 +/- 0.079 | dlogz: 0.000 > 0.100] With the patch it takes 3 minutes.

9490it [03:16, 48.25it/s, +1000 | bound: 999 | nc: 1 | ncall: 12943702 | eff(%): 0.081 | loglstar: -inf < 243.000 < inf | logz: 235.863 +/- 0.078 | dlogz: 0.000 > 0.100]

And if I bump the number of threads to 36 , it takes 50 seconds. I.e. basically it scales properly, while previously uniform sampler basically didn't really scale with multiple threads.

import numpy as np
import dynesty
import multiprocessing as mp

nlive = 1000

def loglike_egg(x):
    logl = ((2 + np.cos(x[0] / 2) * np.cos(x[1] / 2))**5)
    return logl

def prior_transform_egg(x):
    return x * 10 * np.pi

LOGZ_TRUTH = 235.856

def test_bounds():
    bound, sample = 'single', 'unif'
    # stress test various boundaries
    ndim = 2
    rstate = np.random.default_rng(444)
    with mp.Pool(4) as poo:
        sampler = dynesty.NestedSampler(loglike_egg,
                                        prior_transform_egg,
                                        ndim,
                                        nlive=nlive,
                                        bound=bound,
                                        sample=sample,
                                        rstate=rstate,
                                        pool=poo,
                                        queue_size=4)
        sampler.run_nested(dlogz=0.1, print_progress=True)
        assert (abs(LOGZ_TRUTH - sampler.results.logz[-1])
                < 5. * sampler.results.logzerr[-1])

if __name__ == '__main__':
    test_bounds()
segasai commented 6 months ago

It would be great to have some review of this patch @joshspeagle if you have time, before the patch gets bigger.

My current thinking is the patch is mostly positive with a few negatives listed below:

Among things I'd like to do, but I don't want to do in this patch: I think with this patch it's becoming easier to do #391. We already now can pack update_slice or update_rwalk in SliceSampler RWalkSampler objects and make sure that things like history https://github.com/joshspeagle/dynesty/blob/3f100253dd53fd302ee84f7f25f010b3bf70ad29/py/dynesty/nestedsamplers.py#L145 is stored inside those objects. But I'd prefer to do that separately, as this requires more thought.

ColmTalbot commented 6 months ago

In Bilby, we rely on the custom sampler (see here), although I'd love to see this supported in dynesty. We mostly use this to allow additional data to be passed through the kwargs, specifically current live points.

segasai commented 6 months ago

Thanks for the link @ColmTalbot. I think those interfaces will certainly break with this patch, which is I think unfortunate, but I think if we combine this change with the new sampler interface ( #391), hopefully it'll be worthwhile. I did particularly want to implement the ensemble (a-la emcee) proposal in any new sampler interface.

What I need to implement the new way of sampling is to formalize the exact interface for those 'inner samplers' (rwalk/slice/uniform)

It is clear each sampler needs the sample() interface as defined in sampling.py, They also need update() interface as currently written in update_slice() etc. But I think my understanding is some samplers also need extra arguments passed to evolve_point, so therefore each sampler also needs some sort of generate_args(Sampler). In the simplest form it would just set the kwargs to whatever you want.