joshspeagle / dynesty

Dynamic Nested Sampling package for computing Bayesian posteriors and evidences
https://dynesty.readthedocs.io/
MIT License
357 stars 77 forks source link

Guideance on using DynestyStatic random walk sampling with parallel pool #241

Closed Jammy2211 closed 3 years ago

Jammy2211 commented 3 years ago

For our use case, we have found that the DynestyStatic sampler with random walk mode is INCREDIBLE. It is a complete game changer to our science, so thank you!

Up to now, we have always used the sampler in serial mode, as our use case is such that we parallellize at a higher level and therefore spawn many jobs in serial. However, we now have a use case where the likelihood evaluation times are so long (30 seconds +) that the only way to make progress is to parallelize at the level of the non-linear search (hopefully, Dynesty). We have a lot of CPU time to throw at this!

For context, we typically apply the sampler with ~50 live points, rwalks=5 and Gaussian priors that are 'initialized' to overlap with the high likelihood regions of parameter space (we can determine this efficiently via fast non-linear searches).

I am looking for guideance for whether you think that parallelizing this job is simply a matter of use the pool feature in Dynesty, with the same settings as before. There are a few specifics it would be good to have clarity on:

1) In MultiNest, when you parallelize the job, all parallel samples after the accepted samples are discarded, meaning that parallelization above 4 or so cores is pointless. Am I right in thinking this would not be the case for rwalk sampling in Dynesty?

2) Has the issue discussed here (https://github.com/joshspeagle/dynesty/issues/164) been addressed? Is there any update on this I should be aware of?

3) Are there any other bottlenecks I should be aware of that means that using 30-60 cores in parallel simply will not scale up in the way I am hoping?

Any general advise or guidance would be appreciated, if you think there is anything worth me knowing! My plan is to just go ahead and try it out for myself, but I suspect you can point me in the right direction to what I should be aware of :D.

joshspeagle commented 3 years ago

Sorry about the delay in responding -- the last week has been crazy on my end. Glad to hear the code has been working well for you! 🙂

To describe a bit more what the parallelization scheme is in dynesty in response to your questions:

  1. dynesty uses a scheme following PolyChord, which proposes points independently in parallel and then accepts them off the queue in order. So that means if you have L_0 and propose L_1, L_2, ..., L_50, you step down the line, see whether L_1 > L_0, accept the point and set L_1 -> L_0, and repeat for L_2. The probability of any subsequent point being accepted will decrease since the threshold is larger, but the scaling isn't awful (see fig 5 from the PolyChord paper).
  2. No, it has not. The basic issue still stands in that the parallelization happens synchronously, rather than asynchronously, and only the final sample is returned rather than any intermediate ones. This means you'll be limited by the slowest sampling core at every iteration.
  3. The key quantity is really n_cores / n_live. You should be okay as n_cores ~ n_live, but the performance will almost certainly be strictly worse than the ideal case due to the issue described above plus the fact that sampling isn't perfectly uncorrelated.

Hope this helps!

Jammy2211 commented 3 years ago

Follow up question - my loglikelihood function has a lot of stuff in it (data, functions, etc). It is probably > 1GB in memory. Am I right in thinking that the implementation of multiprocessing.pool in Dynesty will essentially be passing this log likelihood function (with all the stuff it has in it) to every CPU every time a LH evaluation is made?

Im getting extremely slow performance using multiprocessing, and think this is the explanation. Just looking for confirmation!

joshspeagle commented 3 years ago

Am I right in thinking that the implementation of multiprocessing.pool in Dynesty will essentially be passing this log likelihood function (with all the stuff it has in it) to every CPU every time a LH evaluation is made?

Yes, this is correct. You can get around this by instantiating some of the large objects separately in each member of the pool and then calling them from the likelihood function, but it can get pretty hack-y.

Jammy2211 commented 3 years ago

Great thanks, sounds like I've got a fun task for tomorrow!