joshspeagle / dynesty

Dynamic Nested Sampling package for computing Bayesian posteriors and evidences
https://dynesty.readthedocs.io/
MIT License
347 stars 76 forks source link

Dynesty using only 100% of CPU across multiple cores when passing a pool #303

Closed wenhaoxuan closed 3 years ago

wenhaoxuan commented 3 years ago

Hi there,

I've been trying to speed up my fits by using the pool and queue_size parameters in the NestedSampler class. When I check my CPU usage, I find that no matter how many processes I give to pool (e.g. 4, 8, etc), the total CPU usage is capped at 100%. That is, each process is using ~10-20% of CPU in a single core, and the usage from all processes add up to 100%. I tried many things from StackOverflow, including using the pathos vs multiprocessing package for the pool, and the same behavior persists. There are definitely enough cores in the machine I'm using (32 cores), and it's not occupied by other jobs.

As another test, I ran a fit with 8 processes and found that the total run time is the same as using only 1 process (no multiprocessing at all).

Any thoughts on what may be causing this behavior?

Cheers, Jerry

bjnorfolk commented 3 years ago

@joshspeagle might have a better solution for this, but I've found MPI to be more reliable on a large number of cores. The emcee website has a fairly decent tutorial on implementing it, the steps are similar for dynesty (https://emcee.readthedocs.io/en/stable/tutorials/parallel/).

wenhaoxuan commented 3 years ago

@bjnorfolk Thanks for pointing me to the emcee page! I tried using MPI as instructed there, and found it to correctly use all cores and speed up the sampling! It would still be good to know why the Python multiprocessing pool doesn't work in my case, but this is really helpful in the meantime.

joshspeagle commented 3 years ago

Thanks @bjnorfolk for jumping in here! Yes, dynesty uses a really simple parallelization scheme that essentially just calls pool.map, so it's performance depends a lot on what the underlying pool is doing. The multiprocessing pool can have weird behaviours like this depending on how things are set up, but I'm definitely not an expert here.