Closed marberi closed 4 months ago
Changes Missing Coverage | Covered Lines | Changed/Added Lines | % | ||
---|---|---|---|---|---|
nautilus/pool.py | 17 | 18 | 94.44% | ||
<!-- | Total: | 25 | 26 | 96.15% | --> |
Totals | |
---|---|
Change from base Build 9114932167: | 0.4% |
Covered Lines: | 1222 |
Relevant Lines: | 1249 |
Thanks @marberi! This looks very neat. Let me read through this in detail in the coming days and maybe also add unit tests, if possible.
@marberi I'm very sorry for this taking longer than expected. I made a few changes to the code and also added unit tests specifically for Dask. Please have a look and let me know whether this works for you. I would then merge this and probably release this as version 1.0.4.
Thanks. Being on leave, so not the most responsive these days. Just downloaded and tested the version after merging and it started to sample just fine.
Basic support for paralleling Nautilus on a Dask cluster. Out high-throughput data center is not optimized or well configured for running MPI jobs. We were discussing running cosmological inference using a code, which apparently relies on Nautilus. Wanting to see if we could avoid MPI, I tested parallelising the basic example in the Nautilus documentation.
After creating a Dask cluster: from dask.distributed import LocalCluster cluster = LocalCluster() client = cluster.get_client()
the interface: sampler = Sampler(prior, likelihood, pass_dict=False, pool=client)
remains unchanged. While the Dask client supports map, the syntax the Dask versions is asynchronous. To avoid this detail being exposed too many locations, the code use wrapper to mimic the same behaviour as the multiprocessing pools.
Testing, all the stages could be parallelized over many machines (not using the local cluster above, but through the dask-jobqueue package).