scikit-learn / scikit-learn

scikit-learn: machine learning in Python
https://scikit-learn.org
BSD 3-Clause "New" or "Revised" License
60.34k stars 25.44k forks source link

Adding implementation of Hartigan's K-Means #16123

Open assaftibm opened 4 years ago

assaftibm commented 4 years ago

Hi,

I'm implementing Hartigan's K-Means in C++ with a Cython wrapper, and when it's done I'd be glad to contribute it to scikit-learn. The implementation follows the pseudo code described in the IJCAI '13 paper by Slonim, Aharoni and Crammer (https://dl.acm.org/doi/10.5555/2540128.2540369) + some optimizations of my own that make the run-time comparable to Lloyd's K-Means.

I'd like to know if the community welcomes this addition.

Thank you.

ogrisel commented 4 years ago

I was not familiar with Hartigan's K-Means but it looks interesting.

However we would rather not add anymore C++ in the scikit-learn codebase and rather focus on Cython.

But before considering implementing Hartigan's K-Means in Cython, let's focus on finishing the new implementation of Lloyd's in #11950 which is significantly more memory efficient and scalable efficient on machines with many CPU cores.