joshspeagle / dynesty

Dynamic Nested Sampling package for computing Bayesian posteriors and evidences
https://dynesty.readthedocs.io/
MIT License
347 stars 76 forks source link

Weight function output #311

Closed segasai closed 3 years ago

segasai commented 3 years ago

While looking at this https://github.com/joshspeagle/dynesty/issues/240 issue I tried a simple code

import test_dyn
import dynesty

sampler = dynesty.DynamicNestedSampler(test_dyn.loglike_egg,
                                       test_dyn.prior_transform_egg,
                                       2,
                                       nlive=500)
sampler.run_nested(maxcall=1000, use_stop=False)
sampler.add_batch(nlive=250, maxcall=1000)  # use_stop=False)

which gave the error Traceback (most recent call last): File "xx.py", line 9, in sampler.add_batch(nlive=250, maxcall=1000) # use_stop=False) File "/home/skoposov/pyenv38/lib/python3.8/site-packages/dynesty/dynamicsampler.py", line 1810, in add_batch for cur_results in self.sample_batch(nlive_new=nlive, File "/home/skoposov/pyenv38/lib/python3.8/site-packages/dynesty/dynamicsampler.py", line 1122, in sample_batch raise RuntimeError( RuntimeError: ('Could not find live points in the required logl interval. Please report!\nDiagnostics. logl_min: 242.29591740055105 ', 'logl_bounds: (242.29591740055105, 242.29591740055105) ', 'saved_loglmax: 242.29591740055105')

Which is the same as this one https://github.com/joshspeagle/dynesty/issues/302

Which led me to think more about this. And I think what's happening here is when we get the results of the run that is 'incomplete' (i.e. stopped due to maxcall/maxiter or whatever) then it's possible that the weight will be concentrated in the top live-point, so the logl_bound will be just the top point (+ pad or so). And this is clearly an issue as it'll be hard to generate proposal points from that. So I'm wondering if we should enforce that the interval covers at least nlive points ? (I.e. ideally I'd get rid of pad parameter and just use nlive to pad the interval if needed.

segasai commented 3 years ago

Also my suggestion will automatically mean that if we just interrupted the run_nested before logz is reached, the add_batch would just resume with the top live points which is probably a good idea

joshspeagle commented 3 years ago

Hm. This seems like a reasonable explanation and suggested fix, although I'd want to see how it modifies the overall behaviour and sampling efficiency in more typical use cases. It probably would be good to keep the padding in for consistency, so that the final size of the interval is essentially max(current padding scheme, nlive points) or something.

segasai commented 3 years ago

I agree it's a significant change, and I think a better fix is something like this https://github.com/joshspeagle/dynesty/blob/18a953e4138c842b3cbee971ccf4f2dc9a81da65/py/dynesty/dynamicsampler.py#L1119

where we select the points to initialize the sampler. We just ensure that we have at least nblive points even if we have to go below the original boundary.

segasai commented 3 years ago

I meant this patch https://github.com/segasai/dynesty/commit/2f07c3f9d65e5bdb32de26a4bb2ca732df10a5b1

joshspeagle commented 3 years ago

Ah yes, that looks like the right approach.