joshspeagle / dynesty

Dynamic Nested Sampling package for computing Bayesian posteriors and evidences
https://dynesty.readthedocs.io/
MIT License
347 stars 76 forks source link

Need Help Understanding Parallelization in dynesty #189

Closed ruskin23 closed 4 years ago

ruskin23 commented 4 years ago

I am trying to use dynesty for my project but I am having trouble parallelizing the processes.

This is a simplified version of my code. The likelihood is calculated from two uncorrelated Gaussian distributions and I have defined both uniform and non-uniform priors over my parameters:

import ipyparallel as ipp

class Pool(object):

    def __init__(self, dview,nprocs):
        self.dview = dview
        self.size = nprocs

    def map(self, function, tasks):
        return self.dview.map_sync(function, tasks)

#likelihood function. 
def loglikelihood(x):

    #2 uncorrelated gaussians with mean : 1.0 and 6.0 and standard deviations: 1.0 and 0.5
    L= (-0.5*(((x[6]-1.0)/1.0)**2)
        -0.5*(((x[4]-6.0)/0.5)**2)
        -numpy.log(1.0*numpy.sqrt(2*numpy.pi))
        -numpy.log(0.5*numpy.sqrt(2*numpy.pi))
        )

    return L

#prior transforms
def prior_transform(u):

    x=numpy.array(u)

    for i,s in enumerate(sampling_parameters):
        if s[-1]=='Normal':
            mean=s[1]
            sigma=s[2]
            x[i]=scipy.stats.norm.ppf(u[i],loc=mean,scale=sigma)
        elif s[-1]=='Turncated_Normal':
            mean=s[1]
            sigma=s[2]
            low=(s[3]-mean)/sigma
            high=(s[4]-mean)/sigma
            x[i]=scipy.stats.truncnorm.ppf(u[i],low,high,loc=mean,scale=sigma)
        elif s[-1]=='Uniform':
            x[i]=(s[2]-s[1])*u[i] + s[1]

    return x

#a tuple of parameters to sample from
sampling_parameters = [('Porb',8.46,0.0000230,'Normal'),
                       ('feh',-1.06,0.3,-1.014,0.537,'Truncated_normal'),
                       ('eccentricity',0.042,0.01,0.0,0.45,'Truncated_normal'),
                       ('Wdisk',2*scipy.pi/14,2*scipy.pi/1.4,'Uniform'),
                       ('logQ',5.0,12.0,'Uniform'),
                       ('primary_mass',0.5,1.2,'Uniform'),
                       ('age',1e-3,10.0,'Uniform')]

ndim=len(sampling_parameters)

#Starting Client
rc = ipp.Client()
nprocs = len(rc.ids)
print(rc.ids)

dview = rc[:]
dview.use_dill();

dview['sampling_parameters']=sampling_parameters

#sync imports on all engines
with dview.sync_imports():
    import numpy
    import scipy
    from scipy import stats
    import dynesty

pool=Pool(dview,nprocs)

dsampler=dynesty.NestedSampler(loglikelihood, prior_transform,
                               ndim,nlive=1500,pool=pool)

dsampler.run_nested()
dresults=dsampler.results
dresults.summary()

Initiating 4 engines using ipcluster I get following summary:

4865it [03:17, 24.65it/s, +1500 | bound: 0 | nc: 1 | ncall: 39254 | eff(%): 16.215 | loglstar:   -inf < -1.146 <    inf | logz: -4.440 +/-  0.038 | dlogz:  0.001 >  1.509]
Summary
=======
nlive: 1500
niter: 4865
ncall: 39254
eff(%): 16.215

However if I dont parallelize and just run sampler:

5032it [00:06, 727.18it/s, +1500 | bound: 0 | nc: 1 | ncall: 40116 | eff(%): 16.283 | loglstar:   -inf < -1.145 <    inf | logz: -4.551 +/-  0.041 | dlogz:  0.001 >  1.509]
Summary
=======
nlive: 1500
niter: 5032
ncall: 40116
eff(%): 16.283
logz: -4.551 +/-  0.041

So my question is how dynesty is parallelizing the calculations because I can see the total time taken for the parallel processes is much greater than the serial process? Is there something wrong with my setup?

joshspeagle commented 4 years ago

So the parallelization in dynesty is extremely simple: it just calls pool.map instead of the default Python map whenever possible when evaluating points. This works under the assumption that the log-likelihood dominates the runtime, which isn't always true. In the cases where it doesn't, the overhead involved with sending out operations to multiple cores to immediately re-collect them can be quite prohibitive, which appears to be the problem here.

If you want to control some of this behavior, you can do so using the use_pool argument. This allows you to control exactly what operations are done using the pool and what can just be done on the main core. I've copy-pasted the documentation below to highlight what options you have to play with:

use_pool: dict, optional A dictionary containing flags indicating where a pool should be used to execute operations in parallel. These govern whether prior_transform is executed in parallel during initialization ('prior_transform'), loglikelihood is executed in parallel during initialization ('loglikelihood'), live points are proposed in parallel during a run ('propose_point'), and bounding distributions are updated in parallel during a run ('update_bound'). Default is True for all options.

Let me know if this helps.

joshspeagle commented 4 years ago

Closing this for now.

ruskin23 commented 3 years ago

Hello again

I was trying to understand where does parallelization is helping while proposing a new point. As I understand, you fill a queue using pool.map (here) with a live point proposed using a sampling function. I can see that besides the uniform sampling function, all the other sampling functions are already conditioned with logl_prop>loglstar. So when the queue is filled do all the points in the queue have loglikelihoods better than the current worst live point (loglstar) except in the case of uniform sampling?