New results object - Githubissues

segasai commented 3 years ago

Here's a working preliminary implementation of the Results() that is more than just a dictionary. It still accepts the list of key values or a dictionary, but prevents manual updates, and adds some additional methods

coveralls commented 3 years ago

Pull Request Test Coverage Report for Build 1237073539

66 of 80 (82.5%) changed or added relevant lines in 2 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage increased (+0.3%) to 82.075%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
py/dynesty/results.py	45	52	86.54%
py/dynesty/utils.py	21	28	75.0%
<!--	Total:	66	80	82.5%	-->

Totals
Change from base Build 1232247582:	0.3%
Covered Lines:	3425
Relevant Lines:	4173

💛 - Coveralls

joshspeagle commented 3 years ago

I see there are some unresolved comments related to the results structure that is being saved. I've added some additional info below.

samples_it: the sampling iteration when the sample was proposed (e.g., iteration 570)
samples_id: the unique ID of the sample (e.g., ID=2)

These are primarily useful for "winding back" sampling if we want to see what the live points are doing at some intermediate time during the run, such as in boundplot.

samples_n: the number of live points at the point when the sample was proposed (e.g. K=1957)

This is mainly useful for dynamic nested sampling.

samples: the location (in original coordinates). Should be identical to samples_v I think.
eff: the overall sampling efficiency (niter/ncall, more or less)

These are just summary aliases, IIRC.

bound: the set of bounding objects used to condition proposals
bound_iter: the iteration when the corresponding bound was created to propose new live points (e.g., iteration 520)
samples_bound: the index of the bound that the corresponding sample was drawn from (e.g., bound 1)

These are useful when using plotting functions like cornerbound, since they allow us to track particular bounds. I also left them in in case I wanted to tackle adding something like Importance Nested Sampling into the code, which would require knowing which sample came from which bound.

samples_batch: tracks the batch during which the samples were proposed
batch_bounds: the log-likelihood bounds used to sample points in a given batch
batch_nlive: the number of live points added in a given batch

These are mostly additional quantities I added to make sure I could associate points with batches and track that the batches were being added correctly.

Let me know if I missed anything.

segasai commented 3 years ago

I've fixed some XXs, but some questions still remain remain regarding batch quantities and lengths (i.e. whether something has a lenght niter or not). It'd be good for you to edit this directly. Also I don't quite understand what's the difference between batch_nlive vs samples_n and samples_bound vs batch_bound. It'd be good to clarify that. Maybe even rename some of them for consistency.

Also _merge_two() function in the utils.py mentions prop_iter, samples_prop and prop properties, but those do not seem to be used anywhere. I assume those are dead ?

The reason why I think this needs cleaning up, is because resutl is an externally returned object, as opposed to RunRecord, so it would be good to ensure, it is fully understood/described what each of those are. Plus one can establish some kind of tests that verify that arrays have correct lengths etc. ATM I'm not sure of that. Also I was potentially thinking of storing the information about proposed point for rwalk/rslice, so one could compute the degree of correlatedness between samples, but I wouldn't want to do that before the results object is fully described.

joshspeagle commented 3 years ago

I've fixed some XXs, but some questions still remain remain regarding batch quantities and lengths (i.e. whether something has a lenght niter or not). It'd be good for you to edit this directly.

That's easy enough for me to do post-merge. No problem.

Also I don't quite understand what's the difference between batch_nlive vs samples_n and samples_bound vs batch_bound. It'd be good to clarify that. Maybe even rename some of them for consistency.

samples_n tracks the number of effective live points at every iteration, e.g., [180, 179, 178, 177, 190, 190, 190, 195, ...]. batch_nlive just checks how many live points we added in from a particular batch, e.g., 250. samples_bound is the associated bounding distribution, whereas batch_bound is the logl_bound tuple. I agree that renaming both quantities (esp. the second) would be helpful, especially since I don't think they are currently used elsewhere in the code.

Also _merge_two() function in the utils.py mentions prop_iter, samples_prop and prop properties, but those do not seem to be used anywhere. I assume those are dead ?

These were renamed to bound. I can try to update this as well when I go through the internal docstrings.

Also I was potentially thinking of storing the information about proposed point for rwalk/rslice, so one could compute the degree of correlatedness between samples, but I wouldn't want to do that before the results object is fully described.

This would be great and get to addressing (in part) #163, since then we would have the possible option of using that info for tuning the number of walks/slices on the fly.

joshspeagle commented 3 years ago

There's a small conflict with the main branch I need to double-check on, but I will try to merge this in tonight and then get around to cleaning everything up in preparation for the v1.2 release.

segasai commented 3 years ago

Thanks. The conflict can be resolved completely in favor of the results_branch.

segasai commented 3 years ago

I think this probably closes the #175 and #186 too

joshspeagle commented 3 years ago

Agreed. I'll tentatively say this:

Resolves #175.
Resolves #184.
Resolves #186.

joshspeagle / dynesty

New results object #330

Pull Request Test Coverage Report for Build 1237073539

💛 - Coveralls