joshspeagle / dynesty

Dynamic Nested Sampling package for computing Bayesian posteriors and evidences
https://dynesty.readthedocs.io/
MIT License
347 stars 76 forks source link

New results object #330

Closed segasai closed 3 years ago

segasai commented 3 years ago

Here's a working preliminary implementation of the Results() that is more than just a dictionary. It still accepts the list of key values or a dictionary, but prevents manual updates, and adds some additional methods

coveralls commented 3 years ago

Pull Request Test Coverage Report for Build 1237073539


Changes Missing Coverage Covered Lines Changed/Added Lines %
py/dynesty/results.py 45 52 86.54%
py/dynesty/utils.py 21 28 75.0%
<!-- Total: 66 80 82.5% -->
Totals Coverage Status
Change from base Build 1232247582: 0.3%
Covered Lines: 3425
Relevant Lines: 4173

💛 - Coveralls
joshspeagle commented 3 years ago

I see there are some unresolved comments related to the results structure that is being saved. I've added some additional info below.

These are primarily useful for "winding back" sampling if we want to see what the live points are doing at some intermediate time during the run, such as in boundplot.

This is mainly useful for dynamic nested sampling.

These are just summary aliases, IIRC.

These are useful when using plotting functions like cornerbound, since they allow us to track particular bounds. I also left them in in case I wanted to tackle adding something like Importance Nested Sampling into the code, which would require knowing which sample came from which bound.

These are mostly additional quantities I added to make sure I could associate points with batches and track that the batches were being added correctly.

Let me know if I missed anything.

segasai commented 3 years ago

I've fixed some XXs, but some questions still remain remain regarding batch quantities and lengths (i.e. whether something has a lenght niter or not). It'd be good for you to edit this directly. Also I don't quite understand what's the difference between batch_nlive vs samples_n and samples_bound vs batch_bound. It'd be good to clarify that. Maybe even rename some of them for consistency.

Also _merge_two() function in the utils.py mentions prop_iter, samples_prop and prop properties, but those do not seem to be used anywhere. I assume those are dead ?

The reason why I think this needs cleaning up, is because resutl is an externally returned object, as opposed to RunRecord, so it would be good to ensure, it is fully understood/described what each of those are. Plus one can establish some kind of tests that verify that arrays have correct lengths etc. ATM I'm not sure of that. Also I was potentially thinking of storing the information about proposed point for rwalk/rslice, so one could compute the degree of correlatedness between samples, but I wouldn't want to do that before the results object is fully described.

joshspeagle commented 3 years ago

I've fixed some XXs, but some questions still remain remain regarding batch quantities and lengths (i.e. whether something has a lenght niter or not). It'd be good for you to edit this directly.

That's easy enough for me to do post-merge. No problem.

Also I don't quite understand what's the difference between batch_nlive vs samples_n and samples_bound vs batch_bound. It'd be good to clarify that. Maybe even rename some of them for consistency.

samples_n tracks the number of effective live points at every iteration, e.g., [180, 179, 178, 177, 190, 190, 190, 195, ...]. batch_nlive just checks how many live points we added in from a particular batch, e.g., 250. samples_bound is the associated bounding distribution, whereas batch_bound is the logl_bound tuple. I agree that renaming both quantities (esp. the second) would be helpful, especially since I don't think they are currently used elsewhere in the code.

Also _merge_two() function in the utils.py mentions prop_iter, samples_prop and prop properties, but those do not seem to be used anywhere. I assume those are dead ?

These were renamed to bound. I can try to update this as well when I go through the internal docstrings.

Also I was potentially thinking of storing the information about proposed point for rwalk/rslice, so one could compute the degree of correlatedness between samples, but I wouldn't want to do that before the results object is fully described.

This would be great and get to addressing (in part) #163, since then we would have the possible option of using that info for tuning the number of walks/slices on the fly.

joshspeagle commented 3 years ago

There's a small conflict with the main branch I need to double-check on, but I will try to merge this in tonight and then get around to cleaning everything up in preparation for the v1.2 release.

segasai commented 3 years ago

Thanks. The conflict can be resolved completely in favor of the results_branch.

segasai commented 3 years ago

I think this probably closes the #175 and #186 too

joshspeagle commented 3 years ago

Agreed. I'll tentatively say this: