Reinitializing pool with multiprocessing

natashabatalha commented 2 years ago

Bug/Question I would like to add more samples to an existing set of runs but do not know how to do so with the addition of a multiprocessing pool. I am using dynesty version 1.2.2.

Setup Following the example in test_pool.py, I initialize a pool via NestedSampler by doing e.g.:

#test_pool.py
pool = mp.Pool(30)
with pool:
    sampler = NestedSampler(loglike, ptform, ndim,pool=pool,
                                            queue_size=30) 
    sampler.run_nested(maxcall=1000) 
dill.dump(sampler, open('test.dill','wb'))

After this initial run, I would like to build on it on it by reopening up the pool. If I simply perform:

sampler = dill.load(open('test.dill','rb'))
sampler.run_nested()

it works but without parallelization. Is there a way to add more samples in parallel?

For example, something like this:

saved_sample = dill.load(open('test.dill','rb'))
newpool = mp.Pool(30)
with newpool:
    saved_sample.pool = newpool
    saved_sample.run_nested()

Thanks for your help!

segasai commented 2 years ago

This should work

with newpool:
        sampler.pool = pool
        sampler.loglikelihood.pool = pool
        sampler.M = pool.map
        sampler.run_nested()

It is somewhat unwieldy, as there is no official interface for that, but it will work. I can't promise that this interface will stay as this is in the future.

natashabatalha commented 2 years ago

Nice!! Thanks for the quick response. Just for completeness, I find that this works:

pool = mp.Pool(30)
with pool:
    sampler = NestedSampler(loglike, ptform, ndim,pool=pool,
                                            queue_size=30) 
    sampler.run_nested(maxcall=500)
    dill.dump(sampler, open('test.dill','wb'))

#start new pool without doing an intermediate dill dump
newpool = mp.Pool(30)
with newpool:
    sampler.pool = newpool
    sampler.loglikelihood.pool = newpool
    sampler.M = newpool.map
    sampler.run_nested(maxcall=500)

But this yields a pickling error:

#start new pool from saved dill dump
saved_samp = dill.load(open('test.dill','rb'))
newpool = mp.Pool(30)
with newpool:
    saved_samp.pool = newpool
    saved_samp.loglikelihood.pool = newpool
    saved_samp.M = newpool.map
    saved_samp.run_nested(maxcall=500)

Error output:

~/anaconda3/envs/picaso38/lib/python3.8/multiprocessing/connection.py in send(self, obj)
    204         self._check_closed()
    205         self._check_writable()
--> 206         self._send_bytes(_ForkingPickler.dumps(obj))
    207 
    208     def recv_bytes(self, maxlength=None):

~/anaconda3/envs/picaso38/lib/python3.8/multiprocessing/reduction.py in dumps(cls, obj, protocol)
     49     def dumps(cls, obj, protocol=None):
     50         buf = io.BytesIO()
---> 51         cls(buf, protocol).dump(obj)
     52         return buf.getbuffer()
     53 

PicklingError: Can't pickle <function ptform at 0x7f9949c90ee0>: it's not the same object as __main__.ptform

Probably not something that can get be solved. I should add that I am running long (>10 hours) runs which sometimes get stuck, or subject to power outages, or other unforeseen circumstances. I wanted to use a combination of while and maxcall to save intermediate outputs before convergence, with the option to restart from one of the intermediate outputs if something went awry. Therefore, not required functionality. Just wishlist functionality.

segasai commented 2 years ago

I'll close this issue, but I created the #374 issue to track development of the interface for easier restarts

joshspeagle / dynesty

Reinitializing pool with multiprocessing #371