joshspeagle / dynesty

Dynamic Nested Sampling package for computing Bayesian posteriors and evidences
https://dynesty.readthedocs.io/
MIT License
347 stars 76 forks source link

rstagger currently is identical to rwalk #318

Closed segasai closed 3 years ago

segasai commented 3 years ago

This commit https://github.com/joshspeagle/dynesty/commit/37cc3fc6f13103c0dda57b4419ad0e7d211f4d0e broke rstagger, so it now behaves exactly as rwalk. the variable stagger is not used at all.

This also suggests that rwalk and rstagger should be combined into one function, as the code is mostly identical, only tuning is different.

joshspeagle commented 3 years ago

Good catch and good work on the quick fix.

segasai commented 3 years ago

I've been doing further refactoring of this code https://github.com/segasai/dynesty/commit/901ce5f165ed2281b2654e54162dc36c9d7f84b5 , as it's a bit of a mess, And I'm struggling to see the logic behind this https://github.com/joshspeagle/dynesty/blob/fe9654e9da4718682834cf344ad82128ef60abd0/py/dynesty/sampling.py#L406 I.e. here if the number of proposals is larger than 50*walks, we reset the number of proposals to zero for some reason as well naccept, nreject, so we'll now have to rerun the whole loop (while nc<nwalks) again. This also weirdly interacts with stagger scaling. IMO, I think this kind of scaling need to be apllied only when in not staggered mode. And then nc/naccept/nreject shouldn't be reset

segasai commented 3 years ago

Also, it looks like 'walks' parameter determines not the maximum number of random walk steps, but instead number of likelihood function evaluations, if I read the code correctly, which is probably not right.

Furthermore, it looks like the function is not guaranteed to return a value, i.e. if nc > 50 * walks ? I'm worried that not returning anything may have implications for the accuracy of sampling. I.e. slice sampling strictly preserves I(L>Lmax) distribution if input samples are from it, but if this code is allowed to essentially reject any proposals from around some points, I don't know if it is accurate.

joshspeagle commented 3 years ago

Originally, the idea was that if the random walk proposals were stuck, I substantially shrunk the size of the scale factor and then tried again. At some point that was adjusted from shrinkage by e to shrinkage by e^-1/n, IIRC. The logic behind setting the counters back to zero was so it could essentially restart without shrinking at every iteration, which would violate detailed balance. It was a hack at the time, and I agree the behavior is not great and can be improved.

IMO, I think this kind of scaling need to be applied only when in not staggered mode.

Agreed. The staggered mode should naturally shrink down to the proposal scale in question if things work as intended.

Also, it looks like 'walks' parameter determines not the maximum number of random walk steps, but instead number of likelihood function evaluations, if I read the code correctly, which is probably not right.

You're right that there's a distinction there; I had assumed they were somewhat interchangeable when coding things up.

it looks like the function is not guaranteed to return a value, i.e. if nc > 50 * walks ? ... I'm worried that not returning anything may have implications for the accuracy of sampling

Since both rwalk and rstagger shrink the proposal if points are not accepted, you should always return something if a valid point can be found (in theory). Otherwise, you'll just get stuck sampling forever (which will just cause general failure, rather than any specific statistical issues.)

segasai commented 3 years ago

Originally, the idea was that if the random walk proposals were stuck, I substantially shrunk the size of the scale factor and then tried again. At some point that was adjusted from shrinkage by e to shrinkage by e^-1/n, IIRC. The logic behind setting the counters back to zero was so it could essentially restart without shrinking at every iteration, which would violate detailed balance. It was a hack at the time, and I agree the behavior is not great and can be improved.

I guess the detailed balance of the current scheme is still not clear to me...

Anyway I have the refactored random walk in a separate branch (i didn't want to mix it with #316) https://github.com/segasai/dynesty/tree/random_walk_rewrite It'd be nice to merge #316 first before finalizing the random walk.

joshspeagle commented 3 years ago

I guess the detailed balance of the current scheme is still not clear to me...

It's nothing too complicated. The rstagger proposal is taken out of Sivia & Skilling, but the rwalk one is just "random walk with a fixed scale factor to propose a new position. Did you get stuck because no proposed positions were accepted? Decrease the scale factor and then try again."

segasai commented 3 years ago

Okay, I can see indeed that rwalk without adjustment of scale is correctly preserving the distribution. I think I can't yet see that the scale factor changes are safe, but maybe I'm just slow.

segasai commented 3 years ago

On a somewhat related note, I've been looking a the section 9.5.2 of Sivia & Skilling and the discussion of multimodal posterior of different heights. And how nested sampling correctly deals with it. However I think currently only slice sampling does it correctly, because the proposed 'starting points' for sampling are explicitly chosen from points where L> L ( https://github.com/joshspeagle/dynesty/blob/18a953e4138c842b3cbee971ccf4f2dc9a81da65/py/dynesty/sampler.py#L346 ) , while for say rwalk and rstagger, the starting points are chosen with propose_live which currently doesn not exclude the L point. So that will in turn lead to points being trapped in the lower peak of the likelihood. I think that's a bug

joshspeagle commented 3 years ago

the starting points are chosen with propose_live which currently doesn't not exclude the L* point

I hadn't realized rwalk and rstagger were left out of that commit. Those two should be added in; thanks for catching that.

segasai commented 3 years ago

They were left out because the reason for them to be there is somewhat different though, as slice sampling simply can't work if the starting point is L=L*, while rwalk/rstagger work, but with a bias which i think will be only visible with very large separation modes.

segasai commented 3 years ago

Actually for rwalk, rstagger apparently the effect of this change is minuscule, because live-points will escape the low-height mode anyway (contrary to my earlier thought). I.e. in the case where there are K points in the high mode and N points in the low mode, the difference of that change will just the probability of escaping will change from N/(N+K) to N/(N+K-1) per iteration, which is not much. But if one mode is indeed lower, the particles will escape, just because there is only one way path there (to the higher mode)