joshspeagle / dynesty

Dynamic Nested Sampling package for computing Bayesian posteriors and evidences
https://dynesty.readthedocs.io/
MIT License
357 stars 77 forks source link

Jitter speed/refactor #277

Closed segasai closed 3 years ago

segasai commented 3 years ago

This in development branch dealing with jitter_run() code. It's not yet finished. (it's fully working and could be committed but I plan to work further on this) It increases code coverage + add comments.

And one thing I was trying to understand is the origin of this:

   lzterm = (math.exp(loglstar - logz_new) * loglstar +
                  math.exp(loglstar_new - logz_new) * loglstar_new)
        h_new = (math.exp(cur_logdvol) * lzterm + math.exp(logz - logz_new) *
                 (h + logz) - logz_new)

from here https://github.com/segasai/dynesty/blob/e6b2c397859ff0a8aa44ca0586a0c7c4e0ad9f8d/py/dynesty/utils.py#L525

I couldn't find this in your paper and I was wondering if you can unpack/reason about this. It seems that lzterm is essentially Li^2/Z{i+1} + L_{i+1}^2/Z{i+1} I'm not sure I follow where this comes from.

Also, I am not sure I fully understand why the volumes are halved here: https://github.com/segasai/dynesty/blob/e6b2c397859ff0a8aa44ca0586a0c7c4e0ad9f8d/py/dynesty/utils.py#L509 Is it purely because of the 1/2 in the Eqn 16 in your paper ?

coveralls commented 3 years ago

Pull Request Test Coverage Report for Build 904122045


Files with Coverage Reduction New Missed Lines %
py/dynesty/sampler.py 1 82.99%
<!-- Total: 1 -->
Totals Coverage Status
Change from base Build 893674412: 1.6%
Covered Lines: 3420
Relevant Lines: 4662

💛 - Coveralls
joshspeagle commented 3 years ago

I couldn't find this in your paper and I was wondering if you can unpack/reason about this.

Good catch -- I had not included a mention of that in the appendix. These lines are trying to update the estimate of this integral following the strategy outlined in the Skilling (2006) paper.

image

I am not sure I fully understand why the volumes are halved here

Yes, this is just absorbing the 1/2 factor.

segasai commented 3 years ago

Thanks for the pointers. I've fully vectorized the jitter_run calculations as well as provided the explanations. In my on test_dyn.py jitter_run is ~ 70 times faster. And the test_dyn.py itself now runs for 20 sec as opposed to 1 min 20 sec. This leads to much faster test suite as well (17 min vs 31 min before)

I think this is now ready to be merged after review.

One other thing is that the same types of calculations are done not only in jitter_run but also in resample_run, reweight_run. Because of that I think this code needs to be separated into a function and reused.

segasai commented 3 years ago

While trying to understand overflow/underflow mesages introduced in this patch, I looked closely at the paper and eqn A47. The code in utils.py seems to have been written (and saved_h is defined) under the assumption that H_i = 1/z_i integral(L log(L), X=0..X_i) - ln (z_i) while if I am not mistaken the correct interpretation of A47 is that H_i = 1/z integral(L log(L) =X=0..X_i) -ln (z) * z_i/z
I.e. what you get if you actually take the integral from A42 and integrate it over the part of the volume where X<X_i I am under the impression that the existing code there (and the saved_h) was incorrect.

But in the same time, I don't fully follow the justification for A45/A46/A47 either..

joshspeagle commented 3 years ago

Looking back, I'm not entirely happy with how I outlined those Appendix sections since I agree that they're not as clear as I'd like and have a few typos (in intermediate expressions; not in final results).

So the argument for A45 (the static nested sampling case) follows from the original (brief) argument from Skilling in his original 2006 paper. There's a small typo (ln N --> N), but assuming the argument from A44 holds (mean(N) ~ HK) then you get that var(lnZ) ~ H/K. (You then get an additional factor of 2 from arguments in the other sections.)

A46 tries to break this estimate down into the sum of a bunch of pieces var(\sum_i \Delta lnZ_i), which is purely a definitional thing. It then applies the assumption that the distribution of each of the pieces is approximately iid, so you can break up the variance of the sum into the sum of the variances.

A47 also has a typo (there shouldn't be a second \Delta). That's just saying that var(ln Zi) = var(ln Z{i-1}) [the total sum of all the terms] + var(\Delta ln Z_i) [the most recent contribution]. The assumption is that if the number of live points is changing sufficiently slowly such that \Delta ln Xi ~ \Delta ln X{i-1}. This is almost always true: we typically only decrease by 1 at a time, and if we increase by >> 1 (from a batch) then this is just an overestimate (which is fine).

This then gives the modified estimate of A48.

segasai commented 3 years ago

Thanks. I'm starting to unwind all this, step by step. I see that for the Gaussian in n-dimensional space in the prior which is a sphere with radius R The expected number of steps is nK (logRmax-log rmin) where Rmax is the edge of the prior sphere, and rmin is the smallest point corresponding to L_max. Presumably n *Log(Rmax) is somehow H (and rmin can be ignored) . I need to do some algebra...

But going a bit further, I understand that there is some variability in number of steps. I don't immediately see how the variability of N translates into variability of logZ. (Skilling doesn't really explain it)

And still going further I completely don't understand when you are talking about Var[Z_i] because any new realization of the run will have a different number of iterations and i-th point will correspond to different X values. So I just don't undertstand from statistical point of view this.

joshspeagle commented 3 years ago

I don't immediately see how the variability of N translates into variability of logZ. (Skilling doesn't really explain it)

Let me see if I can outline a slightly more intuitive argument. At every iteration, you're getting some uncertainty because you have some probabilistic estimate of the prior volume, not the true prior volume. This translates to some uncertainty on the weights that accrues over time (since Z is just the sum of the weights). So in theory you can just add all these up to get an error estimate.

However, for a given stopping criteria you'll have some variation in how many samples actually are proposed, i.e. the N. So the expected variation in N is an independent way of assessing how the probabilistic prior volume estimates lead to uncertainties in the final logZ estimate (on the order of ~ sqrt(H/K)). Does that make more or less sense?

And still going further I completely don't understand when you are talking about Var[Z_i] because any new realization of the run will have a different number of iterations and i-th point will correspond to different X values.

The true value of the Z_i's will be different, but their expectation (and by extension their variance and the variance of associated quantities such as the X's) can still be studied in a statistical manner.

segasai commented 3 years ago

Okay, I still don't understand the mathematical/statistical reasoning. I understand the handwavy argument, but I'm not quite satistfied with that. Anyway the patch still stands and is ready for review. And I think the way H_i that are now calculated make more sense as they truly are increments in the H integral.

joshspeagle commented 3 years ago

Sounds good. I'll try and review this and get it merged in over the next few days then.

joshspeagle commented 3 years ago

Okay, I think I managed to look everything over and didn't see any obvious errors, but there were a lot of changes. I think the tests are robust enough now to catch any obvious errors, so I'll plan to merge this in. Thanks again for all your work on this.

segasai commented 3 years ago

Thanks for review/merging. One thing that comes out of this commit is that since the regression suite is now quicker, one can think of adding more stuff to it, like checking the posterior, or checking the calculation of H, or some more challenging problems. Also, the updated calculations in resample_run() I think need to be propagated through other parts of utils.py and potentially moved into a separate function.