Closed segasai closed 1 year ago
Here's analysis of the performance number of clusters recovered
for the old version vs new version of the code for different number of clusters N=1,2,5,10,100,1000
Here's the code. I generate points on a grid, randomly scale it and randomly rotate it. I'd say it looks like an improvement.
import itertools
import numpy as np
import dynesty.bounding as db
import math
import scipy.linalg
import multiprocessing as mp
def random_mat(rstate, ndim):
# random rotation matrix
x = np.random.normal(size=(ndim, ndim))
cov = np.cov(x)
mat = scipy.linalg.svd(cov)[0]
return mat
def test_number_clusters(ndim, ncens, nper_dim=10, sig=0.01, seed=1):
# check we can recover close to a correct and large number of clusters
npt = ncens * nper_dim * ndim
rstate = np.random.default_rng(seed)
# create stuff on the grid
nxcens = int(math.ceil(ncens**(1. / ndim)))
cens = np.zeros((ncens, ndim))
curiter = itertools.product(*[range(nxcens)] * ndim)
for i in range(ncens):
cens[i] = next(curiter)
xs = sig * rstate.normal(size=(npt, ndim)) + cens[np.arange(npt) %
len(cens), :]
# scale dimensions
min_scale = 1e-3
scales = 10**rstate.uniform(np.log10(min_scale), 0, size=ndim)
xs = xs * scales[None, :]
# random rotation
mat = random_mat(rstate, ndim)
xs = xs @ mat
E = db.bounding_ellipsoids(xs)
return ncens, len(E.ells)
def doall(ntest=10000, nthreads=36):
rstate = np.random.default_rng(2)
ndims = rstate.integers(2, 20 + 1, size=ntest)
ncens = np.array([1, 2, 5, 10, 100, 1000])
xcens = rstate.integers(0, len(ncens), size=ntest)
ncens = ncens[xcens]
nper_dims = rstate.integers(2, 10 + 1, size=ntest)
res = []
with mp.Pool(nthreads) as poo:
for i in range(ntest):
res.append(
poo.apply_async(test_number_clusters,
(ndims[i], ncens[i], nper_dims[i], 0.01, i)))
for i in range(ntest):
if i % 1000 == 0:
print(i)
res[i] = res[i].get()
return res
Changes Missing Coverage | Covered Lines | Changed/Added Lines | % | ||
---|---|---|---|---|---|
py/dynesty/bounding.py | 17 | 21 | 80.95% | ||
<!-- | Total: | 21 | 25 | 84.0% | --> |
Totals | |
---|---|
Change from base Build 5143604727: | -0.006% |
Covered Lines: | 4169 |
Relevant Lines: | 4535 |
After some more thinking/testing. I have
@joshspeagle comments ?
Merged this as this have been have been sitting for too long
This is a tentative fix to the BIC criterion for ellipsoid splitting introduced in #286 There a factor of two was skipped in the calculation making the splitting of ellipsoids harder. Apparently that leads to significant slowness observed in #440
I'm a bit torn by this issue and whether it's right change. One wouldn't want the results to be that sensitive to that factor of two here and I'm concerned if this will break some other cases...
One thing that this change trips is the test_ellipsoids test case https://github.com/joshspeagle/dynesty/blob/d9164354421f4a010ee2bc232cb4eda2f4f90a88/tests/test_ellipsoid.py#L265 where I check if the number of clusters in 6 x 6 x 6 x 6 grid in 4d is within 10% of 6^4. I don't know if it's a serious concern or not...
Thoughts ?
(sorry for creating a 2nd PR on this, the first one had an incorrect branch by mistake)