joshspeagle / dynesty

Dynamic Nested Sampling package for computing Bayesian posteriors and evidences
https://dynesty.readthedocs.io/
MIT License
346 stars 76 forks source link

Boundary updates #443

Closed segasai closed 1 year ago

segasai commented 1 year ago

This is a tentative fix to the BIC criterion for ellipsoid splitting introduced in #286 There a factor of two was skipped in the calculation making the splitting of ellipsoids harder. Apparently that leads to significant slowness observed in #440

I'm a bit torn by this issue and whether it's right change. One wouldn't want the results to be that sensitive to that factor of two here and I'm concerned if this will break some other cases...

One thing that this change trips is the test_ellipsoids test case https://github.com/joshspeagle/dynesty/blob/d9164354421f4a010ee2bc232cb4eda2f4f90a88/tests/test_ellipsoid.py#L265 where I check if the number of clusters in 6 x 6 x 6 x 6 grid in 4d is within 10% of 6^4. I don't know if it's a serious concern or not...

Thoughts ?

(sorry for creating a 2nd PR on this, the first one had an incorrect branch by mistake)

segasai commented 1 year ago

Here's analysis of the performance number of clusters recovered

image

for the old version vs new version of the code for different number of clusters N=1,2,5,10,100,1000

Here's the code. I generate points on a grid, randomly scale it and randomly rotate it. I'd say it looks like an improvement.

import itertools
import numpy as np
import dynesty.bounding as db
import math
import scipy.linalg
import multiprocessing as mp

def random_mat(rstate, ndim):
    # random rotation matrix
    x = np.random.normal(size=(ndim, ndim))
    cov = np.cov(x)
    mat = scipy.linalg.svd(cov)[0]
    return mat

def test_number_clusters(ndim, ncens, nper_dim=10, sig=0.01, seed=1):
    # check we can recover close to a correct and large number of clusters

    npt = ncens * nper_dim * ndim
    rstate = np.random.default_rng(seed)
    # create stuff on the grid
    nxcens = int(math.ceil(ncens**(1. / ndim)))
    cens = np.zeros((ncens, ndim))
    curiter = itertools.product(*[range(nxcens)] * ndim)
    for i in range(ncens):
        cens[i] = next(curiter)
    xs = sig * rstate.normal(size=(npt, ndim)) + cens[np.arange(npt) %
                                                      len(cens), :]
    # scale dimensions
    min_scale = 1e-3
    scales = 10**rstate.uniform(np.log10(min_scale), 0, size=ndim)
    xs = xs * scales[None, :]
    # random rotation
    mat = random_mat(rstate, ndim)
    xs = xs @ mat
    E = db.bounding_ellipsoids(xs)

    return ncens, len(E.ells)

def doall(ntest=10000, nthreads=36):
    rstate = np.random.default_rng(2)
    ndims = rstate.integers(2, 20 + 1, size=ntest)
    ncens = np.array([1, 2, 5, 10, 100, 1000])
    xcens = rstate.integers(0, len(ncens), size=ntest)
    ncens = ncens[xcens]
    nper_dims = rstate.integers(2, 10 + 1, size=ntest)
    res = []
    with mp.Pool(nthreads) as poo:
        for i in range(ntest):
            res.append(
                poo.apply_async(test_number_clusters,
                                (ndims[i], ncens[i], nper_dims[i], 0.01, i)))
        for i in range(ntest):
            if i % 1000 == 0:
                print(i)
            res[i] = res[i].get()
    return res
coveralls commented 1 year ago

Pull Request Test Coverage Report for Build 5192856803


Changes Missing Coverage Covered Lines Changed/Added Lines %
py/dynesty/bounding.py 17 21 80.95%
<!-- Total: 21 25 84.0% -->
Totals Coverage Status
Change from base Build 5143604727: -0.006%
Covered Lines: 4169
Relevant Lines: 4535

💛 - Coveralls
segasai commented 1 year ago

After some more thinking/testing. I have

@joshspeagle comments ?

segasai commented 1 year ago

Merged this as this have been have been sitting for too long