slice -> rslice + high-d bounding scaling issues

segasai commented 3 years ago

Currently the default sampler in high dimensions is slice, which for a simple correlated 200d gaussian gives me ~ 0.06% efficiency vs 3.5% for rslice. I think rslice should be used instead.

joshspeagle commented 3 years ago

Agreed.

segasai commented 3 years ago

I'll submit the PR. I'm just now running a test with the 200d gaussian, checking that it actually gives a correct answer...

segasai commented 3 years ago

Continuing the discussion of #284

The this is the same plot as before, but now with single bound instead of multi

It's clear the issue disappears. So with the new change the splitting is likely a bit too agressive (it's still not 100% but very likely)

Below is the code used for the test

import numpy as np
from scipy import linalg
import scipy.stats
import dynesty  # noqa
from dynesty import plotting as dyplot  # noqa
from dynesty import utils as dyfunc  # noqa
"""
Run a series of basic tests to check whether anything huge is broken.

"""

nlive = 5000
printing = False

# GAUSSIAN TEST

def get_covar(rstate, ndim):
    eigval = 10**np.linspace(-3, 0, ndim)
    M = scipy.stats.ortho_group.rvs(dim=ndim, random_state=rstate)
    ret = M @ np.diag(eigval**2) @ M.T
    return ret

class Config:
    def __init__(self, rstate, ndim_gau):
        self.ndim_gau = ndim_gau
        self.mean_gau = np.linspace(-1, 1, ndim_gau)
        cov_gau = get_covar(rstate, ndim_gau)
        self.cov_gau = cov_gau
        self.cov_inv_gau = linalg.pinvh(cov_gau)  # precision matrix
        logdet = np.linalg.slogdet(cov_gau)[1]
        self.lnorm_gau = -0.5 * (np.log(2 * np.pi) * ndim_gau + logdet)
        self.prior_win = 1000  # +/- 10 on both sides
        self.logz_truth_gau = ndim_gau * (-np.log(2 * self.prior_win))
        assert (np.isfinite(self.lnorm_gau))
        assert (np.isfinite(self.logz_truth_gau))

class si:
    config = None

# 3-D correlated multivariate normal log-likelihood
def loglikelihood_gau(x):
    """Multivariate normal log-likelihood."""
    co = si.config
    x1 = x - co.mean_gau
    return -0.5 * x1 @ co.cov_inv_gau @ x1 + co.lnorm_gau

# prior transform
def prior_transform_gau(u):
    """Flat prior between -10. and 10."""
    return si.config.prior_win * (2. * u - 1.)

def do_gaussian(co, sample=None, bound=None, rstate=None):
    si.config = co
    # import multiprocessing as mp
    sampler = dynesty.DynamicNestedSampler(loglikelihood_gau,
                                           prior_transform_gau,
                                           co.ndim_gau,
                                           nlive=nlive,
                                           bound=bound,
                                           sample=sample,
                                           rstate=rstate)
    sampler.run_nested(print_progress=printing)
    res = sampler.results
    return res.logz[-1], res.logzerr[-1]

def do_gaussians(sample='rslice', bound='single'):
    #for ndim in [2, 4, 8, 16, 32]:
    for ndim in range(2, 33):
        #, 64, 128]:
        rstate = np.random.RandomState(seed=ndim)
        co = Config(rstate, ndim)
        res = do_gaussian(co, sample=sample, bound=bound, rstate=rstate)
        print('RESULTS', ndim, res, co.logz_truth_gau)

joshspeagle commented 3 years ago

Oh wow -- that's pretty definitive. The remaining trend is likely due to either error underestimation, sampling bias, or both, but the scale of the original trend looks pretty stark. Seems like above ~15 dimensions are where things start to diverge.

joshspeagle commented 3 years ago

Copying the original image from @segasai below for reference:

segasai commented 3 years ago

I was thinking of potentially changing the vol_dec of hierarchical ellipsoidal splitting. I.e. if we do it hierarchically we need to require bigger vol_dec change. I.e. if we just split into two it's fine to require vol_dec, but if the first split didn't work but we are still going further maybe we need to require more of the volume change

segasai commented 3 years ago

here is the combined plot (old/new refer to before #284 and after #284 )

joshspeagle commented 3 years ago

Yea, @JohannesBuchner had an implementation that completely moves away from the iterative k-means approach towards full agglomeration clustering. It's implemented as part of RadFriends and SupFriends here. I've always wanted to expand it to other parts of the code but never found the time to completely redo things. I've never run tests for robustness in higher dimensions, but that might also be a solution to this in addition to modifying the vol_dec behaviour.

segasai commented 3 years ago

Yes, I agree the kmeans stuff is not ideal (but rewriting of this needs to be done base on actual data). I think the tests that I presented here https://github.com/joshspeagle/dynesty/issues/237#issuecomment-817017241 are probably a proper way to validate that ss things need to be tested on different topologies. ATM I kind'a don't like that #284 technically made things worse (maybe not in the regime where most people care), but I'd like to put in a fix that makes things better again.

joshspeagle commented 3 years ago

Yea, I agree on all counts. I would prefer not to cut a new release and push to pip until we can at least get back some of the behaviour from before #284. I do think cutting out the pointvol dependencies has been useful overall, however, as relying on that has been somewhat of a (dangerous?) crutch.

segasai commented 3 years ago

Here's the #286 + with bootstrap=5 as well (green curve)

it certainly helps, but maybe a bit too slow in practice.

joshspeagle commented 3 years ago

It's good to see that mostly solves the problem (as it was designed to!), even if it is quite slow.

segasai commented 3 years ago

I have an alternative idea which I'll try to implement tomorrow if I have time is to replace the splitting criteria by the AIC inspired calculation. I.e. if you have N points in M-dimensional space surrounded by an ellipse with volume V, and you compare it with the decomposition with k ellipses with volumes V_i each covering n_i points, then you should only chose it if (\sum n_i log(V_i)) + k(M(M+3)/2) is smaller than N log(V) + M * (M+3)/2 where those are essentially log -likelihoods penalized by the number of params. The nice thing is that will depend on the number of dimesions and number of live points

JohannesBuchner commented 3 years ago

Isn't 'rslice' actually Hit-and-Run Monte Carlo (HARM)? May be useful to mention in the docs.

I am advocating for HARM in https://arxiv.org/abs/2101.09675 and currently running some diffusion simulations to understand its qualities vs slice sampling and referring to literature there.

regarding AIC, IIRC the Feroz&Hobson (2008) MultiNest paper used BIC, and discusses a bunch of options they tried.

You can speed up the bootstrapping with Cython code. dynesty&nestly already use cython via scipy's kmeans algorithm.

segasai commented 3 years ago

Thanks for pointing at BIC instead of AIC. I've checked Pelleg+2000 (cited in Feroz+2008), and then derived the splitting criterion that's based on BIC (not exactly what's in Pelleg, since we have uniform distributions inside ellipsoids). That basically gives the condition Vnew/V0 < exp( - (k-1) X ln(N)/N) where V0 is old volume, Vnew new volume, X=(Ndim*(Ndim+3))/2 number of parameters of one ellipsoid, k the new number of ellipsoids and N is the number of datapoints. For 2 vs 1 split 500 live-points in 10 dimensions that gives the volume decrement of 0.45 similar to what was coded there before.

With this change the plot returns to the baseline behaviour

Regarding HARM/rslice, yes it seems they are indeed the same (it's still obviously still a Neal2003 slice sampler). It's probably good to add more refs on that.

joshspeagle commented 3 years ago

The BIC performance looks great! I'll merge this in now. Thanks so much to both of you for your assistance with improving this :)

joshspeagle / dynesty

slice -> rslice + high-d bounding scaling issues #285