splitting ellipsoids sometimes stops prematurely

Nautilus relies on multi-ellipsoidal decomposition to propose new points. This is done by first putting an ellipsoid around the live points and then splitting ellipsoids further if the volume of the ellipsoid union is too large. To perform the split, nautilus takes the largest ellipsoid, assigns its members to two groups based on Gaussian mixture modeling (GMM), i.e. sklearn.mixture.GaussianMixture, puts ellipsoids around the two groups and removes the original ellipsoid. Currently, the splitting of ellipsoids may stop prematurely. That's because, in the current implementation, nautilus rejects a split if one of the groups has less than n_points_min members. In rare instances, i.e., the scenario described in #26, this may result in no ellipsoid divisions, and one ends up with a single very large ellipsoid that's many orders of magnitude larger than the live set volume.

To prevent this, nautilus must amend the GMM by putting points into the smaller group based on the per-Gaussian probability coming out of the GMM. Something similar was previously done when nautilus relied on K-Means instead of GMM.

johannesulf / nautilus

splitting ellipsoids sometimes stops prematurely #28