joshspeagle / dynesty

Dynamic Nested Sampling package for computing Bayesian posteriors and evidences
https://dynesty.readthedocs.io/
MIT License
347 stars 76 forks source link

add_batch runs too long #325

Closed segasai closed 3 years ago

segasai commented 3 years ago

While running the tests in a loop to check for failures I've noticed one case of test_dyn that essentially never finishes:

env DYNESTY_TEST_RANDOMSEED=35 PYTHONPATH=py:tests:$PYTHONPATH python -c 'import test_dyn as T; T.test_dyn()'

I.e.

8083it [1:23:21, 11.42s/it, batch: 1 | bound: 9313 | nc: 19654 | ncall: 7119634 | eff(%):  0.113 | loglstar: 242.769 < 243.000 < 243.000 | logz: 236.024 +/-  0.110 | stop:  3.261]
8084it [1:24:24, 26.71s/it, batch: 1 | bound: 9441 | nc: 96312 | ncall: 7215946 | eff(%):  0.112 | loglstar: 242.769 < 243.000 < 243.000 | logz: 236.024 +/-  0.110 | stop:  3.261]
....

19502it [2:11:42,  2.47it/s, batch: 10 | bound: 14604 | nc: 25 | ncall: 11249745 | eff(%):  0.173 | loglstar: 237.105 < 242.999 < 242.731 | logz: 236.043 +/-  0.053 | stop:  0.978]

It seems that in this case the batch proceeds and tries to achieve dlogz=1e-4 which basically leads to climbing the likelihood closer and closer to the peak (which has max(logl)=243) I.e. here is the current dlogz and logl
0.0002811984900574771 242.99972179140906

I don't quite understand why 1) This is taking so long to propose a new point (although that may be a natural result that we are next to a peak of the likelihood 2) The dlogz is decreasing so slowly

We also may need to revise the dlogz limit for batches of 1e-4. I would like this value to be more motivated by something (i.e. logz_err or something like that) rather than an arbitrary constant

joshspeagle commented 3 years ago

So the dlogz term is just motivated by the upper bound on the (fractional) remainder term for the evidence integral. What I'm guessing happens is that (1) occurs since we're close to the peak and (2) is because the dlogz is mostly just set by the maximum log-likelihood among the live points and the current logX, which updates slowly here since we're close to the peak. So essentially we get "lucky" and propose a really good point close to the peak with a logX estimate far above the actual value, which means our upper bound is much larger than the real remainder, and as a consequence we get stuck proposing tiny iterations until the remaining estimated logX value becomes small.

We can likely fix this by just setting the bound to be 1e-2 (1% remainder) instead of 1e-4 (0.01% remainder). I can see possibly setting a condition on this to be smaller than some fraction of the current logzerr estimate though (i.e. systematics should only contribute to some fraction of the error budget). Is that along the lines of what you had in mind?

segasai commented 3 years ago

So the dlogz term is just motivated by the upper bound on the (fractional) remainder term for the evidence integral. What I'm guessing happens is that (1) occurs since we're close to the peak and (2) is because the dlogz is mostly just set by the maximum log-likelihood among the live points and the current logX, which updates slowly here since we're close to the peak. So essentially we get "lucky" and propose a really good point close to the peak with a logX estimate far above the actual value, which means our upper bound is much larger than the real remainder, and as a consequence we get stuck proposing tiny iterations until the remaining estimated logX value becomes small.

Yes, I agree with that. I'm still a bit surpsed by (1) I'll try to check what's going on there. And I agree with (2) The question is whether we can detect the situation somehow.

We can likely fix this by just setting the bound to be 1e-2 (1% remainder) instead of 1e-4 (0.01% remainder). I can see possibly setting a condition on this to be smaller than some fraction of the current logzerr estimate though (i.e. systematics should only contribute to some fraction of the error budget). Is that along the lines of what you had in mind?

Yes, i'm not excited to change the bound to 1e-2. I think maybe 0.1* logzerr or something like that. In fact why not dlogz of the original run? I was also thinking if the very narrow range of logl values (or lack of updates of loglmax) should be a reason to stop.... All that is connected I think to this warning that we've recently added

https://github.com/joshspeagle/dynesty/blob/c43ca8b19933557111bee9b8b338e1416605d246/py/dynesty/dynamicsampler.py#L1317

but given how frequently I see it, I don't think it's actually useful

joshspeagle commented 3 years ago

In fact why not dlogz of the original run?

This seems like the best option, tbh.

I think maybe 0.1* logzerr or something like that.

This seems like a very reasonable alternate stopping criteria too that could be added into the general code.

I was also thinking if the very narrow range of logl values (or lack of updates of loglmax) should be a reason to stop

This is something I think was implemented in nestle as more of an ad-hoc stopping criterion. Given the note you sent me, however, I think this could be formalized and made more rigorous.

segasai commented 3 years ago

Looking more at the batch code.

Why is the delta log volume defined this way: https://github.com/joshspeagle/dynesty/blob/c43ca8b19933557111bee9b8b338e1416605d246/py/dynesty/dynamicsampler.py#L1265 as opposed to 1/nlive ?

Also I'm somewhat concerned with this where we https://github.com/joshspeagle/dynesty/blob/c43ca8b19933557111bee9b8b338e1416605d246/py/dynesty/dynamicsampler.py#L1266 overwrite the last log-volume in order for this https://github.com/joshspeagle/dynesty/blob/c43ca8b19933557111bee9b8b338e1416605d246/py/dynesty/sampler.py#L694 code to work. But that creates weird issues that the logz/logzvar all corresponds to the saved/(ie base) run while logvol is new.

I think the self.sampler.saved_run object needs to be truncated for the add_batch run, so it only initially sees the 'bottom' part of the samples ? (in fact for longer term, I think the 'hacking of the sampler' for add_batch needs to be changed to actual initialization...

joshspeagle commented 3 years ago

Why is the delta log volume defined this way

See equation A28 and A29 from the dynesty text. It's just the arithmetic (rather than geometric) mean.

I think the 'hacking of the sampler' for add_batch needs to be changed

This would be my dream -- hacking the sampler was never a great approach but it worked well enough at the time. I don't disagree that it leads to a lot of internal issues where a bunch of intermediate results are all more or less garbage and only make sense once things get "re-merged" to make a new run.

segasai commented 3 years ago

Why is the delta log volume defined this way

See equation A28 and A29 from the dynesty text. It's just the arithmetic (rather than geometric) mean.

(presumably you mena a25/a26) I guess it's not clear to me why for the batch run (on its own; before merging) the volume needs to be that. I thought that when you started the batch run, it's basically just a standard nested run.

I think the 'hacking of the sampler' for add_batch needs to be changed

This would be my dream -- hacking the sampler was never a great approach but it worked well enough at the time. I don't disagree that it leads to a lot of internal issues where a bunch of intermediate results are all more or less garbage and only make sense once things get "re-merged" to make a new run.

Yes, lets not do that now, but maybe after 1.2 together with the new results infrastructure.

joshspeagle commented 3 years ago

I guess it's not clear to me why for the batch run (on its own; before merging) the volume needs to be that. I thought that when you started the batch run, it's basically just a standard nested run.

I actually do that for the standard ones too: https://github.com/joshspeagle/dynesty/blob/c43ca8b19933557111bee9b8b338e1416605d246/py/dynesty/sampler.py#L155

I believe that with the exception of the variance approximation, which uses on the geometric mean, everywhere else in the code I try and use the arithmetic mean. The differences are mostly negligible except when the number of live points is small or when the number of live points is changing.

segasai commented 3 years ago

I guess it's not clear to me why for the batch run (on its own; before merging) the volume needs to be that. I thought that when you started the batch run, it's basically just a standard nested run.

I actually do that for the standard ones too:

https://github.com/joshspeagle/dynesty/blob/c43ca8b19933557111bee9b8b338e1416605d246/py/dynesty/sampler.py#L155

I believe that with the exception of the variance approximation, which uses on the geometric mean, everywhere else in the code I try and use the arithmetic mean. The differences are mostly negligible except when the number of live points is small or when the number of live points is changing.

Ouch. I agree the difference is small but I don't think it's correct to use 'arithmetic mean'. Because log(vol) = \sum log[X_{j+1}/Xj] therefore E[logvol] = \sum E[\log X{j+1}/X_j] = \sum 1/N_j I.e. those expectations are exactly what adds up as a logvolume.

joshspeagle commented 3 years ago

I believe it just depends on whether you're using estimators for E[X] or E[logX] when computing various quantities. Most of the results are stored as logX for numerical stability, but in calculations such as https://github.com/joshspeagle/dynesty/blob/c43ca8b19933557111bee9b8b338e1416605d246/py/dynesty/utils.py#L653 or https://github.com/joshspeagle/dynesty/blob/c43ca8b19933557111bee9b8b338e1416605d246/py/dynesty/utils.py#L589 we are internally converting back to X.

segasai commented 3 years ago

I see now. I guess because the integration is also done for the integration over X rather than logX, I agree what's done is correct. I guess dlv is a bit of a misnomer, it should be ldv ))