Adding dynesty support (especially to retrievals)

wbalmer commented 9 months ago

Hi Tomas, I just wanted to open this issue to keep track of our discussion about dynesty support. This isn't very urgent at all, but it would go a long way to help me be more productive with the retrievals I plan to do in the near future (for reasons I describe below). I hope to start with adding dynesty support for the retrievals and then we can likely take the lessons learned there and add support for the forward modeling (I think that is less important since the nuances of nested sampling aren't really a potential limiting factor in doing the grid interpolation/model comparison).

Some hand wavy reasons for implementing dynesty:

options to do both static nested sampling (effectively reproduces MultiNest) and dynamic nested sampling (effectively reproduces UltraNest).
options to do importance nested sampling
the two above give more flexibility in allowing the user to choose how they'd like to estimate their posterior and to what fidelity
actively developed (so easier to bugfix and maintain as python versions change)
better documented
better multiprocessing support: will enable multiprocessing on a single machine without using mpi, and will enable using mpi across multiple nodes. for instance, at my facility there is only 48 cores per node, and so with mpi + multinest I can only use that many cores. but with schwimmbad and dynesty, i could allocate cores across multiple nodes that follows a single master process on one node (I think?).

The issue with a clean port, e.g. adding a run_dynesty function that is effectively the same as run_multinest except it passes the loglike and prior functions to the dynesty interface right now is that within species.fit.retrieval.run_multinest the loglike_func and prior_func are local objects, functions defined inside run_multinest and not class attributes or global objects. In this way, when attempting to set a pool with dynesty.pool.Pool(npool, loglike_func, prior_func) as pool:, you get an error like

AttributeError: Can't pickle local object 'AtmosphericRetrieval.run_dynesty.<locals>.loglike_func'

and when trying to initialize dynesty on one core dsampler = dynesty.NestedSampler(loglike_func, prior_func,len(self.parameters)), you get an error like

TypeError: AtmosphericRetrieval.run_dynesty.<locals>.prior_func() missing 2 required positional arguments: 'n_dim' and 'n_param'

I've tried to hack something together to show these errors on a dummy branch of my fork, dynesty_branch_2.

I've pseudo-coded / parallel implemented a bigger overhaul in a separate file species.fit.dretrieval.py on a branch of my fork dynesty-retrieval which is my first attempt to solve this by making loglike_func and prior_func attributes of the AtmosphericRetrieval class so that they can be interpreted by dynesty. This of course involves making the variables defined within run_multinest that loglike and prior rely on into class attributes as well, which became something of a mess - I don't intend that code to be the actual strategy of implementation, more like its a sketch book 😅 .

If there's a different way to do this that I've overlooked, however, that doesn't involve such a big change to the structure of the AtmosphericRetrieval class, I'd prefer that. Hence why I've done most of this in pseudo code and in a separate file to start tinkering with. I was able to run a full retrieval on some test data using multiprocessing locally using dynesty by calling AtmosphericRetrieval with dretrieval.py, but I haven't been able to get a mpi implementation to work.

I'm also trying to think through if there is any memory saving way to set up the retrieval across multiple cores, that doesn't involve having to load the opacity tables for each core..., that has been an issue on my cluster. That is definitely out of the scope of this issue but may be possible in the future with the schwimmbad/dynesty architecture this issue could help set up? I am thinking of the main/worker structure (like the mpi-demo.py in schwimmbad) that loads the AtmosphericRetrieval and RadTrans object once, but is able to distribute the calc_flux to workers in the pool. There is probably a reason I am not thinking of that this is not feasible.

tomasstolker commented 9 months ago

Thanks a lot William! Great that you are trying to implement this!

In FitModel, there was a similar issue from what I remember when adding run_ultranest. I had to make the embedded function for the likelihood an actual method of the class. Within run_multinest and run_ultranest, I then added a local function for the likelihood that calls the class method. This should indeed be possible here as well (I think), although I am not sure how straightforward. I can have a look so the code structure can be used for your implementation.

Would really help indeed if the opacity memory could be shared between cores, but I am not sufficiently experienced with multiprocessing to understand what is the best way to do that... Maybe Paul and Evert have put some thought into that?

tomasstolker commented 9 months ago

I have restructured the AtmosphericRetrieval class in the last commit on the main branch (see b2adbcbd623f3951cdcd06bfe0653492af661134).

In run_retrieval you can run the Dynesty sampler, which can use either prior_transform() and lnlike_func(), or lnprior_dynesty() and lnlike_dynesty(). The latter in case anything Dynesty-specific needs to be implemented with the prior/lnlike otherwise these can be removed.

Would this help with your Dynesty implementation?

I am not sure if lnlike_func/lnlike_dynesty can be pickled or if it needs to be moved outside of the class. In that last scenario it would be a bit more work to implement because the attributes should be converted into parameters. But hopefully it works like this 🤞.

tomasstolker commented 9 months ago

I have set it up a bit differently! There is now the setup_retrieval method, and then separately run_multinest/run_dynesty. The latter will have only MultiNest/Dynesty parameters, so perhaps the kwargs_multinest is not needed. Sorry about that, but we can leave it for now and decide later if this new setup works.

wbalmer commented 9 months ago

Wow! Awesome stuff. I will pull soon and try out later this week ^_^ thanks for entertaining this implementation Tomas!

wbalmer commented 9 months ago

Hi Tomas - I've implemented an initial run_dynesty on my fork's branch add_run_dynesty that is based on the backtracks implementation. That branch is up to date with commit 3d150aa.

I've tested the various permutations (dynamic bool, n_pool None/int, mpi_pool bool, resume bool) of the multiprocessing locally to confirm that they at least begin sampling and save the intermediate products. I have a few lines at the end that take the dynesty output results object and saves a retrieval_post_equal_weights.dat file that is equivalent to the output from pymultinest.

I'm going to try to test this on my cluster next, I couldn't do more than a few processes with mpi or a multiprocess Pool on my local machine due to memory constraints.

Let me know what you think, and if you'd like me to open a PR for that branch.

tomasstolker commented 9 months ago

Awesome! Good to hear that it is working already!

Would be great if you could create a PR for that 👍

tomasstolker commented 8 months ago

Thanks again for implementing the Dynesty support in AtmosphericRetrieval! Since I had been making some updates in FitModel, I have adapted run_dynesty in there, which was quite straightforward (see 1f02b5a).

wbalmer commented 8 months ago

Looks great! I had a commit I forgot to push to do something similar, though I hadn't yet documented it. I'll test on my end with some more AF Lep b fits. I think we can close the issue now :).

tomasstolker / species

Adding dynesty support (especially to retrievals) #84