ellipsoidal decomposition inefficient if one cluster has few members

As it is currently implemented, multi-ellipsoidal decomposition may result in unreasonably large boundaries compared to the target volume. Take the example below in two dimensions where the blue cluster has a much larger volume than the orange cluster. Due to the volume difference, the orange cluster has a few points, i.e., two. Any reasonable minimum-volume enclosing ellipsoid has at least three points in two dimensions. This requirement means that any multi-ellipsoidal decomposition encompassing all points must have at least one ellipsoid with points from both clusters. However, let's assume we move the clusters further and further away from each other. In this case, the ellipsoid encompassing points from both clusters would also grow indefinitely in volume. Eventually, the volume of the ellipsoid union would be much larger than the combined volume of both clusters, which is unaffected by how far they are apart.

The above example illustrates a problem that likely happens in higher dimensions even if the distance between the clusters does not appear large. It is enough to have two separate clusters, with one having less than n_points_min points. In this case, the volume of the multi-ellipsoidal outer bound may be much larger than the target volume, i.e., the volume inside the iso-likelihood surface. This volume difference means that even if the neural networks do a good job estimating that surface, the networks accept a negligible fraction of points proposed from the multi-ellipsoid union. This low acceptance rate causes the sampler to freeze when constructing a new bound. Such a scenario is what caused the freeze described in #26.

A solution will likely involve removing points within ellipsoids with an unreasonably low point density (number of points / ellipsoid volume) compared to the other ellipsoids. In the example above, this would imply that the multi-ellipsoid would not encompass the orange cluster. This should not have a strong impact on the sampling since the orange cluster has negligible volume to begin with. It also shouldn't cause a systematic bias in the results since nautilus is using importance nested sampling and not nested sampling.

johannesulf / nautilus

ellipsoidal decomposition inefficient if one cluster has few members #29