Closed segasai closed 3 years ago
Agreed.
I'll submit the PR. I'm just now running a test with the 200d gaussian, checking that it actually gives a correct answer...
Continuing the discussion of #284
The this is the same plot as before, but now with single bound instead of multi
It's clear the issue disappears. So with the new change the splitting is likely a bit too agressive (it's still not 100% but very likely)
Below is the code used for the test
import numpy as np
from scipy import linalg
import scipy.stats
import dynesty # noqa
from dynesty import plotting as dyplot # noqa
from dynesty import utils as dyfunc # noqa
"""
Run a series of basic tests to check whether anything huge is broken.
"""
nlive = 5000
printing = False
# GAUSSIAN TEST
def get_covar(rstate, ndim):
eigval = 10**np.linspace(-3, 0, ndim)
M = scipy.stats.ortho_group.rvs(dim=ndim, random_state=rstate)
ret = M @ np.diag(eigval**2) @ M.T
return ret
class Config:
def __init__(self, rstate, ndim_gau):
self.ndim_gau = ndim_gau
self.mean_gau = np.linspace(-1, 1, ndim_gau)
cov_gau = get_covar(rstate, ndim_gau)
self.cov_gau = cov_gau
self.cov_inv_gau = linalg.pinvh(cov_gau) # precision matrix
logdet = np.linalg.slogdet(cov_gau)[1]
self.lnorm_gau = -0.5 * (np.log(2 * np.pi) * ndim_gau + logdet)
self.prior_win = 1000 # +/- 10 on both sides
self.logz_truth_gau = ndim_gau * (-np.log(2 * self.prior_win))
assert (np.isfinite(self.lnorm_gau))
assert (np.isfinite(self.logz_truth_gau))
class si:
config = None
# 3-D correlated multivariate normal log-likelihood
def loglikelihood_gau(x):
"""Multivariate normal log-likelihood."""
co = si.config
x1 = x - co.mean_gau
return -0.5 * x1 @ co.cov_inv_gau @ x1 + co.lnorm_gau
# prior transform
def prior_transform_gau(u):
"""Flat prior between -10. and 10."""
return si.config.prior_win * (2. * u - 1.)
def do_gaussian(co, sample=None, bound=None, rstate=None):
si.config = co
# import multiprocessing as mp
sampler = dynesty.DynamicNestedSampler(loglikelihood_gau,
prior_transform_gau,
co.ndim_gau,
nlive=nlive,
bound=bound,
sample=sample,
rstate=rstate)
sampler.run_nested(print_progress=printing)
res = sampler.results
return res.logz[-1], res.logzerr[-1]
def do_gaussians(sample='rslice', bound='single'):
#for ndim in [2, 4, 8, 16, 32]:
for ndim in range(2, 33):
#, 64, 128]:
rstate = np.random.RandomState(seed=ndim)
co = Config(rstate, ndim)
res = do_gaussian(co, sample=sample, bound=bound, rstate=rstate)
print('RESULTS', ndim, res, co.logz_truth_gau)
Oh wow -- that's pretty definitive. The remaining trend is likely due to either error underestimation, sampling bias, or both, but the scale of the original trend looks pretty stark. Seems like above ~15 dimensions are where things start to diverge.
Copying the original image from @segasai below for reference:
I was thinking of potentially changing the vol_dec of hierarchical ellipsoidal splitting. I.e. if we do it hierarchically we need to require bigger vol_dec change. I.e. if we just split into two it's fine to require vol_dec, but if the first split didn't work but we are still going further maybe we need to require more of the volume change
here is the combined plot (old/new refer to before #284 and after #284 )
Yea, @JohannesBuchner had an implementation that completely moves away from the iterative k-means approach towards full agglomeration clustering. It's implemented as part of RadFriends and SupFriends here. I've always wanted to expand it to other parts of the code but never found the time to completely redo things. I've never run tests for robustness in higher dimensions, but that might also be a solution to this in addition to modifying the vol_dec
behaviour.
Yes, I agree the kmeans stuff is not ideal (but rewriting of this needs to be done base on actual data). I think the tests that I presented here https://github.com/joshspeagle/dynesty/issues/237#issuecomment-817017241 are probably a proper way to validate that ss things need to be tested on different topologies. ATM I kind'a don't like that #284 technically made things worse (maybe not in the regime where most people care), but I'd like to put in a fix that makes things better again.
Yea, I agree on all counts. I would prefer not to cut a new release and push to pip
until we can at least get back some of the behaviour from before #284. I do think cutting out the pointvol
dependencies has been useful overall, however, as relying on that has been somewhat of a (dangerous?) crutch.
Here's the #286 + with bootstrap=5 as well (green curve)
it certainly helps, but maybe a bit too slow in practice.
It's good to see that mostly solves the problem (as it was designed to!), even if it is quite slow.
I have an alternative idea which I'll try to implement tomorrow if I have time is to replace the splitting criteria by the AIC inspired calculation. I.e. if you have N points in M-dimensional space surrounded by an ellipse with volume V, and you compare it with the decomposition with k ellipses with volumes V_i each covering n_i points, then you should only chose it if (\sum n_i log(V_i)) + k(M(M+3)/2) is smaller than N log(V) + M * (M+3)/2 where those are essentially log -likelihoods penalized by the number of params. The nice thing is that will depend on the number of dimesions and number of live points
Isn't 'rslice' actually Hit-and-Run Monte Carlo (HARM)? May be useful to mention in the docs.
I am advocating for HARM in https://arxiv.org/abs/2101.09675 and currently running some diffusion simulations to understand its qualities vs slice sampling and referring to literature there.
regarding AIC, IIRC the Feroz&Hobson (2008) MultiNest paper used BIC, and discusses a bunch of options they tried.
You can speed up the bootstrapping with Cython code. dynesty&nestly already use cython via scipy's kmeans algorithm.
Thanks for pointing at BIC instead of AIC. I've checked Pelleg+2000 (cited in Feroz+2008), and then derived the splitting criterion that's based on BIC (not exactly what's in Pelleg, since we have uniform distributions inside ellipsoids). That basically gives the condition Vnew/V0 < exp( - (k-1) X ln(N)/N) where V0 is old volume, Vnew new volume, X=(Ndim*(Ndim+3))/2 number of parameters of one ellipsoid, k the new number of ellipsoids and N is the number of datapoints. For 2 vs 1 split 500 live-points in 10 dimensions that gives the volume decrement of 0.45 similar to what was coded there before.
With this change the plot returns to the baseline behaviour
Regarding HARM/rslice, yes it seems they are indeed the same (it's still obviously still a Neal2003 slice sampler). It's probably good to add more refs on that.
The BIC performance looks great! I'll merge this in now. Thanks so much to both of you for your assistance with improving this :)
Currently the default sampler in high dimensions is slice, which for a simple correlated 200d gaussian gives me ~ 0.06% efficiency vs 3.5% for rslice. I think rslice should be used instead.