Further slice sampling issues

segasai commented 3 years ago

While continuing narrowing down rslice sampling issues leading to these kind of errors

  File "<string>", line 1, in <module>
  File "/home/skoposov/curwork/dynesty/tests/test_pathology.py", line 52, in test_pathology_dynamic
    sampler.run_nested(dlogz_init=1, print_progress=printing)
  File "/home/skoposov/curwork/dynesty/py/dynesty/dynamicsampler.py", line 1599, in run_nested
    passback = self.add_batch(nlive=nlive_batch,
  File "/home/skoposov/curwork/dynesty/py/dynesty/dynamicsampler.py", line 1719, in add_batch
    for results in self.sample_batch(nlive_new=nlive,
  File "/home/skoposov/curwork/dynesty/py/dynesty/dynamicsampler.py", line 1161, in sample_batch
    for it, results in enumerate(
  File "/home/skoposov/curwork/dynesty/py/dynesty/sampler.py", line 798, in sample
    u, v, logl, nc = self._new_point(loglstar_new, logvol)
  File "/home/skoposov/curwork/dynesty/py/dynesty/sampler.py", line 410, in _new_point
    u, v, logl, nc, blob = self._get_point_value(loglstar)
  File "/home/skoposov/curwork/dynesty/py/dynesty/sampler.py", line 394, in _get_point_value
    self._fill_queue(loglstar)
  File "/home/skoposov/curwork/dynesty/py/dynesty/sampler.py", line 383, in _fill_queue
    self.queue = list(self.M(evolve_point, args))
  File "/home/skoposov/curwork/dynesty/py/dynesty/sampling.py", line 759, in sample_rslice
    raise RuntimeError("Slice sampling appears to be "
RuntimeError: Slice sampling appears to be stuck! Some useful output quantities:
u: [0.5       0.4862176]
u_left: [0.49999999 0.48072734]
u_right: [0.50000001 0.48904739]
u_hat: [1.66894077e-08 8.32005324e-03]
loglstar: 17.956478101034662
axes: [[ 1.12058641e-03  0.00000000e+00]
 [-8.31235509e-02  3.54360519e+02]]
axlen: 938.2393864917427.

I have a few comments (no patch yet) I think this line is counter-productive https://github.com/joshspeagle/dynesty/blob/a55832be419e4e019de7142f9c9651d0acc31326/py/dynesty/sampling.py#L752 as the slice sampling is guaranteed to succeed if the starting point satisfies the criterion (see Neal+2003). And now that seems to be the case. So I think we should get rid of that exception. If anything, we should put a warning or exception if there are too many expansions.

Another point is the axlen value -- step-size for slice sampling.
First I think it's a question whether axlen > 1 or > sqrt(ndim)/2 make sense. Possibly not, and we may need to truncate it (we only need to think about how that affects scaling of slice sampling)

Now the question is why the axlen is so large -- i.e. why the axis length of an ellipsoid can be that large. And the tests using the function from test_pathology.py show what's can happen: For some distribution of points, i.e like shown here

The ellipsoid parameters chosen from the covariance matrix can be such that the fmax value https://github.com/joshspeagle/dynesty/blob/a55832be419e4e019de7142f9c9651d0acc31326/py/dynesty/bounding.py#L1440 is > 50. That means that the ellipsoid is expanded by this much. And that in turn (I think) can lead to large axlen of 700 and that can easily lead to sampler stuck message.

Possible solutions:

Get rid of the sampler stuck message (I think that must be done irrespective)
Limit the step size of the slice sampler to be < sqrt(Ndim)/2
Changing the code bounding_ellipsoids. The current code is not giving the smallest volume ellipsoid. There is vast literature on the subject http://citeseerx.ist.psu.edu/viewdoc/summary?doi=%20%2010.1.1.116.7691 The method there is obviously slower (but maybe we can resort to it if fmax is large)

And finally while I was writing this, I was continuing the investigation. is that the root cause of the problem is the volume scaling in bounding_ellipsoid() Basically we have an extremely narrow elllipsoid. and at certain point bounding_ellipsoid() is called with volume of 0.001996 (don't know where this comes from) and this basically blows it the ellipsoid to be 700 times the cube size.
It's unclear what's the best fix for that...

The problem is demonstrated with the code

from __future__ import (print_function, division)
import numpy as np
import dynesty

nlive = 1000
printing = True
alpha = 1e-8

def loglike(x):
    # this is 1/|x} distribution along the x axis
    # it stops rising near zero at alpha
    # the second dimension is flat
    logl = -np.log(np.maximum(np.abs(x[0]), alpha))

    noplateau = -1e-10 * (x**2).sum()
    # this is to avoid complete plateau

    return logl + noplateau

def prior_transform(x):
    return x * 2 - 1

def test_pathology_dynamic():
    ndim = 2
    for sampler in ['unif', 'rslice']:
        sampler = dynesty.DynamicNestedSampler(loglike,
                                               prior_transform,
                                               ndim,
                                               nlive=nlive,
                                               bound='multi',
                                               sample=sampler)
        sampler.run_nested(dlogz_init=1, print_progress=printing)
        logz_truth = np.log(1 - np.log(alpha))
        # this the integral
        logz, logzerr = sampler.results.logz[-1], sampler.results.logzerr[-1]
        thresh = 4
        assert (np.abs(logz - logz_truth) < thresh * logzerr)

joshspeagle commented 3 years ago

This is a very detailed set of results -- thanks for running these tests and doing such a careful job tracking things down.

I think this line is counter-productive...So I think we should get rid of that exception. If anything, we should put a warning or exception if there are too many expansions.

Agreed. This has now been resolved by using the loglstar threshold, which guarantees success. Including a warning to flag possible issues is a good addition.

Possible solutions:

Limit the step size of the slice sampler to be < sqrt(Ndim)/2

Yes, setting a sensible upper limit makes a lot of sense. I support the larger value of sqrt(Ndim)/2 based on the length of the diagonal of the Ndim-hypercube.

Changing the code bounding_ellipsoids. The current code is not giving the smallest volume ellipsoid.

Yes, that seems reasonable. A slower but more robust method seems like a reasonable thing to do, especially as a fallback option if fmax is large.

the root cause of the problem is the volume scaling in bounding_ellipsoid() ... this basically blows the ellipsoid to be 700 times the cube size.

The best fix might be to constrain the maximum size of a given axlen to be on the order of sqrt(Ndim), thereby limiting the amount to which an ellipsoid can blow up this way.

joshspeagle commented 3 years ago

Should be resolved as of #269.

segasai commented 3 years ago

I don't think this is fully closed. And some of the thoughts mentioned here need to be implemented.

joshspeagle commented 3 years ago

Ah, no, you're right. I was overeager. The most recent PR did not include an upper bound on the slice proposals or the ellipsoid volumes. Reopening this.

joshspeagle commented 3 years ago

PR #271 now includes additional checks on the slice proposals. It doesn't quite deal with the ellipsoid volume issue AFAIK, but I think it'd enough that I can consider this closed. (The bounding problem can probably be opened as its own separate issue.)

joshspeagle / dynesty

Further slice sampling issues #266