scikit-learn-contrib / scikit-learn-extra

scikit-learn contrib estimators
https://scikit-learn-extra.readthedocs.io
BSD 3-Clause "New" or "Revised" License
185 stars 42 forks source link

Add Evidence Accumulation Clustering #134

Open thomasjpfan opened 2 years ago

thomasjpfan commented 2 years ago

Issue to keep track of https://github.com/scikit-learn/scikit-learn/pull/1830:

Evidence accumulation clustering: EAC, an ensemble based clustering framework: Fred, Ana LN, and Anil K. Jain. "Data clustering using evidence accumulation." Pattern Recognition, 2002. Proceedings. 16th International Conference on. Vol. 4. IEEE, 2002.

Basic overview of algorithm:

  1. Cluster the data many times using a clustering algorithm with randomly (within reason) selected parameters.
  2. Create a co-association matrix, which records the number of times each pair of instances were clustered together.
  3. Cluster this matrix.

This seems to work really well, like a kernel method, making the clustering "easier" that it was for the original dataset.

The default of the algorithm are setup to follow the defaults used by Fred and Jain (2002), whereby the clustering in step 1 is k-means with k selected randomly from 10 and 30. The clustering in step 3 is the MST algorithm, which I have yet to implement (will do in this PR).