rapidsai / cuml

cuML - RAPIDS Machine Learning Library
https://docs.rapids.ai/api/cuml/stable/
Apache License 2.0
4.23k stars 532 forks source link

[FEA] Support for mini-batch KMeans #5079

Open ZiyueXu77 opened 1 year ago

ZiyueXu77 commented 1 year ago

Is your feature request related to a problem? Please describe. I wish cuML could support mini-batch KMeans beyond the current KMeans method. This could be useful in some applications with iterative optimization.

Describe the solution you'd like Similar solution/API provided by scikit-learn.

Describe alternatives you've considered Currently this could be done by manually creating random mini-batches with standard API, but could hurt efficiency.

Additional context https://scikit-learn.org/stable/modules/generated/sklearn.cluster.MiniBatchKMeans.html

ZiyueXu77 commented 8 months ago

I implemented a federated version of mini-batch KMeans with global sync scheme. details here: https://github.com/NVIDIA/NVFlare/tree/main/examples/advanced/sklearn-kmeans, and the global update code: https://github.com/NVIDIA/NVFlare/blob/main/examples/advanced/sklearn-kmeans/jobs/sklearn_kmeans_base/app/custom/kmeans_assembler.py#L60-L71