h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
http://h2o.ai
Apache License 2.0
6.92k stars 2k forks source link

K-Means initialization slower than expected #13280

Open exalate-issue-sync[bot] opened 1 year ago

exalate-issue-sync[bot] commented 1 year ago

I have a few questions: a) are there any plans to implement mini-batch k-means or spectral clustering? b) ++ and furthest k-means initialization don't appear to parallelize very efficiently and can start to suck up considerably more time than the main iteration sequence for large datasets and large k. Is there any work in progress to speed this up? c) are there any plans to release a python api for sparkling water?

Thanks for making a great product!

hasithjp commented 1 year ago

JIRA Issue Migration Info

Jira Issue: PUBDEV-265 Assignee: Former user Reporter: Tom Kraljevic State: Open Fix Version: N/A Attachments: N/A Development PRs: N/A