h2oai / sparkling-water

Sparkling Water provides H2O functionality inside Spark cluster
https://docs.h2o.ai/sparkling-water/3.3/latest-stable/doc/index.html
Apache License 2.0
966 stars 360 forks source link

How to get kmeans centroids from H2OKMeansMOJOModel #2666

Closed dzlab closed 3 years ago

dzlab commented 3 years ago

I'm training a KMeans model in SW with something like this https://docs.h2o.ai/sparkling-water/3.0/latest-stable/doc/ml/sw_kmeans.html When I get back after training a H2OKMeansMOJOModel I cannot find a way to get the centroids, but it seems that H2O provides an API called centers() that return them see https://github.com/h2oai/h2o-3/blob/master/h2o-py/demos/constrained_kmeans_demo_cluto.ipynb

h2o_km_co_cluto = H2OKMeansEstimator(k=10, user_points=user_points, cluster_size_constraints=[100, 200, 100, 200, 100, 100, 100, 100, 100, 100], standardize=True)

h2o_km_co_cluto.train(x=["x", "y"], training_frame=data_h2o_cluto)
...
centers_km_co_cluto = pd.DataFrame(h2o_km_co_cluto.centers())

@mn-mikke How can I get the centroids in SW similarly to what H2O is providing?

mn-mikke commented 3 years ago

Hi @dzlab, If you need the actual centroid coordinates, we will expose it in SW-2639. If you need them for calculation certain clustering metrics, you can maybe leverage metrics that SW provides since the version 3.34.0.1-1.

import ai.h2o.sparkling.ml.metrics.H2OClusteringMetrics
val clusteringMetrics = model.getTrainingMetricsObject().asInstanceOf[H2OClusteringMetrics]
dzlab commented 3 years ago

@mn-mikke awesome thanks a lot, I will wait for SW-2639.