Closed segasai closed 3 years ago
Okay, now I think I understand what's going on. The weight function called here https://github.com/joshspeagle/dynesty/blob/1a3cce6f490f275919208e3bdd154139ab20b117/py/dynesty/dynamicsampler.py#L1779 uses the results() data which is based on saved_logl (i.e. including multiple batch runs) however sample_batch() relies on the base_logl when selecting live-points Therefore it's entirely possible for the weightfunction to return the interval that sits above the highest likelihood of the base run. I think the solution is probably either use the saved* in sample_batch rather than the base run or somehow skip the intervals that sit above the base run (in some sense that means that the base run missed the tip of the posterior)
This is a pretty good find -- nice sleuthing. I agree the best solution would probably be to use the saved_* quantities if possible. Did you already start looking into fixes, or should I also take a look?
It'd be better if you look at this, as it'll take me more time to figure things out. (I may look at it in a few days, if there won't be progress )
Okay, sounds good. Let me start looking at it tomorrow/Friday. Feel free to ping me anytime if you don't hear any updates before next Monday.
On that topic, I think it would be good to refactor the dynamicsampler with something like this https://github.com/segasai/dynesty/commit/316a2314ccdc957504f74d2f2452bea265cad18c because currently the code is full of
new_u.append(u)
saved_u.append(u)
and it's hard to verify correctness with so much duplication.
Plus I wasn't 100 % sure all the relevant arrays were properly kept in sync.
The original issue is still quite painful. I've tried to change the references from base_run to saved_run in sample_batch, but the regression tests fail after that I think due to the mismatch between number of livepoints in batches vs base run and sample_batch() cannot deal with that. I don't quite know how to fix that.
I tried to tackle it in 6f3587de0faaae3c506a4a316597e9540dee2b53 But I don't know if that's is correct. (two tests fail, I don't know if it's just random noise or not).
This should now be resolved via #248, so closing this for now.
While running dynesty dynamic sampler thousands of times I'm occasionally hitting the error "cannot find live points in the required logl interval". To debug it I put additional print there. See below:
The important point is logl_min is > base_logl.max(). This tells me that the sample_batch here https://github.com/joshspeagle/dynesty/blob/1a3cce6f490f275919208e3bdd154139ab20b117/py/dynesty/dynamicsampler.py#L1780 is called with the logl low boundary that is above the logl of the base run. I can't immediately figure out why this happens. (and since the error occurs sporadically on a cluster can't really debug it). So I'll put it here for the moment. I wonder if the weight function is incorrect somehow since it provides the bounds...