Test clustering algorithm with toy clusters in real data

Introducing artificial into existing data allows to test a clustering algorithm's performance, and judge which area of the hyperparameter space is relevant. If a cluster algorithm cannot distinguish an obvious cluster at all, or only with a certain set of hyperparameters, this tells us whether the algorithm is useful at all, or with which hyperparameter configuration it is.

These toy clusters can be introduced...

in a few or in all dimensions,
and can be far away or closer to the main cluster of data,
and can be of different shapes: larger or smaller, Gaussian or uniform etc.

Systematic testing can be done by making it successively harder to distinguish the toy cluster:

Reducing the number of dimensions where the toy cluster is separated. For example by randomizing or smearing out the position in successively more dimensions.
Bringing toy cluster closer to the main cluster.

sebastian-schindler / PhD

Test clustering algorithm with toy clusters in real data #4