joshspeagle / dynesty

Dynamic Nested Sampling package for computing Bayesian posteriors and evidences
https://dynesty.readthedocs.io/
MIT License
357 stars 77 forks source link

Bounding issues/diagnostics #237

Open segasai opened 3 years ago

segasai commented 3 years ago

Hi,

I am looking at the results of the sampling of my complicated problem (with results different from multinest gives me) and I was looking at diagnosing what's going on, so I looked at a few tests that surprised me (below). First my problem: dimensionality 11, 2 periodic parameters, 2500 live points, I run with sampler 'unif' and dlogz_init=0.01

My thoughts on all this:

joshspeagle commented 3 years ago

These are an instructive set of plots and a useful set of tests. Quick comments (in order of points):

segasai commented 3 years ago
joshspeagle commented 3 years ago

I'd say we need a function belonging to Results or to Sampler. Something like validate() with a few options.

Would you envision something like that being run at various checkpoints during the sampling or just at the end? I'm trying to think of exactly how the results could be communicated to the user in an informative way, especially since the feedback I've gotten is the current set of outputs for the progressbar (and the save quantities in the output results) is already pretty large. If the results could inform the bounding distributions too, that would be a dream end goal.

On a sidenote: there is a bootstrap procedure enabled to try and enlarge the ellipsoids, but it's disabled by default because I found it to be unstable in many cases. I could easily see validate() being used to estimate enlargement factors in a very similar way throughout the course of the run, as an example of how that could interface with improvements to the construction of bounding distributions.

dynamic nested sampling's advantage is presented as the ability to improve the posterior by adding more points to it, but it only really works if the posterior is already well traced essentially

Yes, exactly. That's at least how most standard implementations work (including the one here), anyways. I'm happy to add larger disclaimers throughout the docs to try and make this point more explicit. Would that help?

I am actually more concerned by large number of ellipsoids...it's when the number of ellipsoids is large the high logl points are missed

Ah yes, gotcha. Maybe this just means the defaults should be even more conservative than what they already are to try and further mitigate this problem.

segasai commented 3 years ago

Would you envision something like that being run at various checkpoints during the sampling or just at the end? I'm trying to think of exactly how the results could be communicated to the user in an informative way, especially since the feedback I've gotten is the current set of outputs for the progressbar (and the save quantities in the output results) is already pretty large. If the results could inform the bounding distributions too, that would be a dream end goal.

I think I would do it at the end. Sure it's nice to report things during sampling, but I don't think it's necessary. (TBH I don't think I know what every number in dynesty's status bar is ; and I think the only numbers that I check there are time, dlogz and logl ) IMO the goal is to just to warn if is something looks clearly bad.

On a sidenote: there is a bootstrap procedure enabled to try and enlarge the ellipsoids, but it's disabled by default because I found it to be unstable in many cases. I could easily see validate() being used to estimate enlargement factors in a very similar way throughout the course of the run, as an example of how that could interface with improvements to the construction of bounding distributions.

Yes, I saw those parameters, but never experimented with them. TBH I think the right way to deal with that is come up with a few dedicated tests. AFAIU the ellipsoid decomposition is completely decoupled from the rest of the code. So I think testing it on 1) ball in n-D 2) highly elongated 'sausage' 3) curved manifold (i.e. n-D sphere or curved line)
From what I understand the key factors in bounding are, what fraction of the actual distribution is covered (ideally all), what fraction of volume is empty (efficiency) and those numbers as a function of number of points/dimensionality. I think it's not too hard to write those test. And at least there is a set of plots to see if things are better or worse.

segasai commented 3 years ago

Okay -- here's the test.

I've sampled the points from 4 different shapes in 3d/10d: ball, pin (long cylinder with 1/0.01 axis ratio), shell with 1% thickness and 1-D torus (S^1) with 1% thickness and checked the missing fraction of the volume by MultiEllipsoid or Ellipsoid as a function of nlive. I.e. I simulate points from the distribution. Use nlive of them for Multiellipsoid/Ellipsoid and see what fraction of the volume is left out. If everything is correct it should be zero or at least go to zero when nlive->inf The reality is pretty grim, see below multi_10d multi_3d ell_10d ell_3d

I include the code as well

import dynesty.bounding as db
import numpy as np
import scipy.special
import matplotlib.pyplot as plt

def genball(npt, ndim):
    # use Barthe2005
    x = np.random.normal(size=(npt, ndim))
    y = np.random.exponential(0.5, size=npt)
    x1 = x / np.sqrt((y + (x**2).sum(axis=1)))[:, None]
    return x1

def genshell(r1, r2, npt, ndim):
    x = np.random.normal(size=(npt, ndim))
    xnorm = x / ((x**2).sum(axis=1)**.5)[:, None]
    # normed vector
    # radii are distributed like R^(ndim-1)
    # cumul (R^ndim-r1^ndim)/(r2^ndim-r1^ndim)=y
    rs = ((r2**ndim - r1**ndim) * np.random.uniform(size=npt) +
          r1**ndim)**(1. / ndim)
    return rs[:, None] * xnorm

def gen_data(npt, typ, ndim):
    mid = .5  # i'm placing in unit cube
    if typ == 'ball':
        r0 = 0.5
        pts = genball(npt, ndim) * r0 + mid
        volume = (np.pi**(ndim / 2) / scipy.special.gamma(ndim / 2 + 1) *
                  r0**ndim)
    elif typ == 'pin':
        w = 0.01
        a = 1
        pts = np.zeros((npt, ndim))
        pts[:, 1:] = genball(npt, ndim - 1) * w + mid
        pts[:, 0] = (np.random.uniform(size=npt) - .5) * a + mid
        volume = (np.pi**((ndim - 1) / 2) /
                  scipy.special.gamma((ndim - 1) / 2 + 1) * w**(ndim - 1) * a)
    elif typ == 'torus':
        w = 0.01
        r0 = 0.45
        pts = np.zeros((npt, ndim))
        pts[:, :2] = genshell(r0 - w / 2, r0 + w / 2, npt, 2) + mid
        pts[:, 2:] = (np.random.uniform(size=(npt, ndim - 2)) * 2 -
                      1) * w / 2 + mid
        volume = w**(ndim - 2) * np.pi * ((r0 + w / 2)**2 - (r0 - w / 2)**2)
    elif typ == 'shell':
        r1 = 0.45
        r2 = 0.46
        pts = genshell(r1, r2, npt, ndim) + mid
        volume = (np.pi**(ndim / 2) / scipy.special.gamma(ndim / 2 + 1) *
                  (r2**ndim - r1**ndim))
    else:
        raise RuntimeError('unknown', typ)
    return pts, volume

def plotter(ndim, bound):
    ids = (10**np.linspace(np.log10(2 * ndim + 1), 3, 40)).astype(int)
    plt.clf()
    objs = ['ball', 'pin', 'shell', 'torus']
    for i in range(4):
        plt.subplot(2, 2, 1 + i)
        curo = objs[i]
        fracs = np.array([1 - doit(_, curo, ndim, bound)[1] for _ in ids])
        plt.semilogx(ids, fracs)
        plt.xlabel('Nlive')
        if i == 0:
            postf = '%d-D bound %s' % (ndim, bound)
        else:
            postf = ''
        plt.title(curo + ' ' + postf)
        plt.ylabel('missing fraction')
        plt.ylim(0, 1)
        plt.xlim(10, 1000)
    plt.tight_layout()

def doit(nlive, typ, ndim, bound='ell'):
    # return volume fraction, point fraction
    # for nlive points with topology typ in ndim dimension
    #
    totpt = 10 * nlive  # simulate more points
    pts, volume = gen_data(totpt, typ, ndim)
    assert ((pts.min() > 0) and (pts.max() < 1))  # inside cube
    fitpts = pts[:nlive]
    testpts = pts[nlive:]
    logvol_ell, fracin = computer(fitpts, testpts)
    return np.exp(logvol_ell) / volume, fracin

def computer(fitpts, testpts, bound='multi'):
    """ Compute logvolume and fraction of points covered
    given actual live points (fitpts) and test points (testpts)"""
    # ndim = fitpts.shape[-1]
    cent = fitpts.mean(axis=0)
    cov = np.cov(fitpts.T)  # ndim)

    curb = db.Ellipsoid(cent, cov)  # pts)
    curb.update(fitpts, mc_integrate=True)
    if bound == 'multi':
        curb = db.MultiEllipsoid([curb])
        curb.update(fitpts, mc_integrate=True)
    if bound not in ['ell', 'multi']:
        raise RuntimeError('unknown bound', bound)
    # if bound == 'multi':
    #    logvol = curb.monte_carlo_logvol()[0]
    # elif bound == 'ell':
    # logvol = curb.logvol
    # fraction of test points in the boundary
    frac = np.array([curb.contains(_) for _ in testpts]).sum() / len(testpts)
    return curb.logvol, frac

My understanding (unless I have a bug in my code) is that there are massive issues with bounds. Specifically the bound will miss large fraction in low dimensions (because less of the volume is next to edges). And the torus case is extremely problematic (this is mimicking a curved 1d degeneracy). I think that means that the volumes need to be inflated.

segasai commented 3 years ago

And after enabling bootstrap=10 option

multi_10d_boot multi_3d_boot

The problems go away. My takeaway is that the default value of bootstrap must not be set to zero.

segasai commented 3 years ago

Trying bootstrapping with all the changes from my tynifix branch, I see the following problems. 1) With bootstrap factors ranging 1.3-1.8 in 11 dimensions the main problem becomes the propose_point function, because after the expansion, many ellipsoid volumes start to have log(V)=3 or even 5. Which means that substantial volume there is beyond the unit cube. So a significant time (on a single thread), is spent repeated resampling and seeing that the point outside the cube. 2) Another factor which I think can lead to overshredding of ellipsoids, is that multiple ellipsoids have larger bootstrap factor than a single one, that's why I think ellipsoidal splitting is less of a good idea comparing to a naive calculation 3) Despite slowdowns with bootstrap>0, it's clear to me that running bootstrap=0 for anything other than gaussian posterior will produce incorrect results, as the volumes shrink way too fast.

joshspeagle commented 3 years ago

The results from 1 and 2 here are why they ended up being disabled by default, since they require a huge number of live points to avoid becoming overly enlarged in higher-dimensions. As for 3, that's true in the sense that you're guaranteed to miss parameter space but not necessarily true in terms of parameter/evidence estimation. As I outline in the Appendix of the dynesty paper, it turns out that "prior bounds must encompass the entire distribution" doesn't quite need to hold true for evidence estimation (or even posterior marginals) to be ~correctly estimated; the "actual" condition is that the local sampling is able to follow the correct likelihood shrinkage pattern of the global distribution. This turns out to be easier to match than you'd think in many applications, and is why correlated proposals/shredding can often give reasonable results even though they might violate this condition pretty severely at times. Of course, this only holds true until it doesn't hold true, which always comes back to bite you at the worst possible time...

Anyways, what I ultimately mean to say is I think these tests are fantastic and really highlight some of the problems with this particular proposal strategy (with uniform sampling), but that it's not guaranteed to be a total loss. It's also one of the main struggles I had when setting the defaults for most users. Do you have additional recommendations for changes outside of the new set of tests you've submitted and possible utility functions/plots to add in?

segasai commented 3 years ago

On the first I don't think I quite agree. I agree that the shrinking will be correct if your logl>X shape is constant as a function of logl. I.e Gaussian/eggshell, But if the topology/shape changes as logl changes that's where the problem will arise I suspect (don't have an easy example to show, but is clearly true for my science problem I was solving)

IMO I'd rather have defaults that are protective, I.e. they are maybe an overkill for simple problems, but are the right thing for difficult ones. Especially since as opposed to say standard MCMC approaches there is not a set of standard convergence tests for nested sampling (a-la Gelman-Rubin/Geweke etc tests) that can give you some confidence.

IMO I think bootstrap/sampling can be improved. For example, Currently the scaling of the region is done in all directions, but maybe we can get scaling that is different along different directions to avoid scaling beyond the cube. I also noticed that the sampling within the ellipsoid can be probably parallized as well, which would avoid it being a bottleneck in the case of parallel running. Alternatively I was thinking if the ellipse is that much larger than the unit-cube, maybe we can just sample in a bounding box of the ellipse that fits inside unit cube ? I haven't tested whether that can be benefitial. Anyway I think having a test problem where bootstrap is somewhat stuck is good IMO, to try to improve.

In terms of immediate recommendations, for sure validating function evaluations/boundaries after the run is important and alarming about big discrepancies. In my opinion, having the boundary algorithm return the correct boundary with the default settings is important as well (i.e. the tests that I give above). I'd put those in.

ghost commented 3 years ago

Hi @segasai, would it be possible for you to include the code you used to generate all these diagnostic plots? It might be useful for myself and other users with similar issues in the meantime. Thanks!

segasai commented 3 years ago

Hi @a-lhn the volume check code is given above. If you want to reproduce the plots like shown above you can use the results of the sampler:

sampler.run_nested(dlogz=0.01)
mask=np.array([[bb.contains(_) for _ in sampler.saved_run.D['u']] for bb in sampler.bound])

that will give you the boolean mask for all boundaries and all the points in the run

joshspeagle commented 3 years ago

Just a note to myself that it might be cool to try and add in some of these utilities as part of the code for users to check as part of the new release (see #254).

segasai commented 3 years ago

i think a problem that I currently see is that the bound set is not currently properly tracked, i.e. we don't track which logl level each bound correspond to. I.e. the code just does self.bound.append(bound) But ideally one would want some self-contained object with (logl, bound) pairs. That would somewhat help #232 as well.

But otherwise we already now can reproduce the plot from the top by this

res=sampler.results
plt.clf();plt.imshow(np.array([[_.contains(__) for __ in res['samples_u']] for _ in res['bound']]),aspect='auto')

Part of the reason why I'm interested in quantifying boundaries is that I'm interested in approaches where the first sampling can dictate some kind of transform of data that makes the sampling much easier. I.e. use first dynesty run as exploration, and then sample with much higher sampling efficiency. (I know some developers of zeus-mcmc were thinking/working on such approaches using normalizing flows).

segasai commented 3 years ago

I think explicitly tracking the (logl -> bounding) object (separate from the actual samples) would be also potentially useful, if one wants to sample the modes separately or deciding on merging of independent runs. I.e. if their Boundary(logl) don't overlap enough they can't be merged.

segasai commented 2 years ago

I will link the issue #232 here and close it to avoid having two similar issues.