joshspeagle / dynesty

Dynamic Nested Sampling package for computing Bayesian posteriors and evidences
https://dynesty.readthedocs.io/
MIT License
346 stars 76 forks source link

Returning multiple samples from rwalk proposal distribution #201

Closed rory-smith closed 2 years ago

rory-smith commented 4 years ago

Hi Josh,

Firstly, thank you for this excellent code. My work, and that of my colleagues, is very much the better for using dynesty! As you know, the LIGO/Virgo gravitational-wave community uses dynesty to infer the properties of astrophysical sources of gravitational waves. Within the bilby inference library, we use a fairly aggressive rwalk sampler which is based on the one in sampling.py. We generally take a very large number of random walk steps in order to not get trapped in local regions, properly explore boundaries etc... As a result, we often end up with a large list of candidate samples of which we only pick 1. I believe that this is wasting a lot of valuable samples, and I suspect that we could accelerate convergence if we were to return multiple samples from the rwalk on each iteration. Formally, I think this should be equivalent to how dynesty returns several samples in parallel when using a parallel pool. As we're already using a large mpi pool with dynesty, I'd now like to explore whether we can further reduce the wall-time cost of our inference analyses by increasing the number of samples returned by each parallel process.

On the surface it seems like only a few minor changes would be needed. For example, I ultimately just want each process to return an array of logl's, u's and v's. However, I think that currently dynesty does not expects these quantities to be an array of shape (1,). I'd appreciate any advice on how to cleanly make the kind of modifications I'm after.

joshspeagle commented 4 years ago

Hm. So internally the way the code works is it just grabs points and other associated quantities from the queue (code snippet), whose size is either set by the number of cores (or can be specified manually). The queue gets filled here and the sampling methods for rwalk is here. So I think what you'd have to do is just make sure that:

(1) the queue size is scaled up by the number of samples returned from each parallel process (2) the "map" command is just re-coded to flatten the outputs down from (N, M, P) to (NM, P) (3) the relevant method returns M samples instead of just 1

Let me know if this helps.

avivajpeyi commented 3 years ago

Hi @joshspeagle!

Im @rory-smith 's PhD student, and tried to follow your comments above to hack together something that returns multiple points from the r-walk.

The changes are here: https://github.com/joshspeagle/dynesty/compare/master...avivajpeyi:refactor_rwalk_to_return_more_points?diff=split

Do you think that these would be sufficient? I am concerned about

  1. The blob might need to be adjusted in a better way
  2. That the queue isn't actually using all the new accepted points that I am passing

Thank you for your time! :)

joshspeagle commented 2 years ago

Oh man, I completely missed this last year (yikes!). I'm not sure if you ever ended up using your branch version successfully, but I'm just going to close this for now given the recent changes to rwalk (to actually make sampling fully satisfy detailed balance).

avivajpeyi commented 2 years ago

np! I think we got it to work!