joshspeagle / dynesty

Dynamic Nested Sampling package for computing Bayesian posteriors and evidences
https://dynesty.readthedocs.io/
MIT License
357 stars 77 forks source link

Discarding unwanted samples during a fit #190

Closed Jammy2211 closed 4 years ago

Jammy2211 commented 4 years ago

For our model-fit problem, we often want to discard a sample if it does not meet some criteria. With other samplers, I would perform this discard by returning a likelihood of -np.inf, or explicitly drawing a new point.

For Dynesty, these soutions didn't work and the solution I'm using (which seems to work ok) is to return a likelihood of a very negative number (e.g. -1e99) and then gradually increasing this number of every future discard point (e.g. -1e99 + 100, -1e99 + 200), etc. This probably isn't the best thing for Dynesty to be exposed to, but the algorithm eventually finds a set of samples within the constraints. However, this does mean the Dynesty sampler has a lot of essentially useless samples, which I would guess slow down the analysis and make it large to output as a pickle.

Is there a 'hack' or 'quick fix' I could use to more elegent resample these unwanted points, that wouldn't end up with them being samples seen by Dynesty? Or is there a way I can give Dynesty the initial sset of line points before it begins sampling, so that I can generate them following my rejection criteria beforehand (I have an algorithm to do this already)?

joshspeagle commented 4 years ago

Or is there a way I can give Dynesty the initial set of line points before it begins sampling so that I can generate them following my rejection criteria beforehand (I have an algorithm to do this already)?

There actually is a solution for the second point! There is a live_points argument that you can pass a set of points you've already generated to start off the run. The docstring for this is copied below.

live_points: list of 3 ndarray each with shape (nlive, ndim) A set of live points used to initialize the nested sampling run. Contains live_u, the coordinates on the unit cube, live_v, the transformed variables, and live_logl, the associated loglikelihoods. By default, if these are not provided the initial set of live points will be drawn uniformly from the unit npdim-cube. WARNING: It is crucial that the initial set of live points have been sampled from the prior. Failure to provide a set of valid live points will result in incorrect results.

This can also be done for the DynamicNestedSampler, although there it is initialized (as with other options) at runtime. See the run_nested function under the DynamicSampler class for details.

Does this help?

Jammy2211 commented 4 years ago

Yes, thank you! :)

Jammy2211 commented 4 years ago

(I actually found this docstring and implemented the solution today).

I'll quickly point out that this part of the docstring;

live_points: list of 3 ndarray each with shape (nlive, ndim)

Is incorrect, as the third array (log likelihoods) is shape nlive, not (nlive, ndim).

joshspeagle commented 4 years ago

Oh whoops. Good catch on the shape -- I'll try to fix the typo if I remember, but it's a small enough error that I don't think it's too big a deal.