peterwittek / somoclu

Massively parallel self-organizing maps: accelerate training on multicore CPUs, GPUs, and clusters
https://peterwittek.github.io/somoclu/
MIT License
268 stars 70 forks source link

clusterID of the original samples #137

Open isaac-you opened 6 years ago

isaac-you commented 6 years ago

SOM clustering is a good customer segment method, and your somoclu make the method strong enough to deal with big data. Thank you so much. But when I have done the train process, I can only find clusterID for nodes or neurons, but there is no clusterID for the original samples. Besides your default cluster number is 8 for kmeans, so how can I set another cluster number? Thank you so much for your help.

isaac-you commented 6 years ago

best matching units array do not tell me the ClusterID directly. when I do the experiment from https://somoclu.readthedocs.io/en/stable/example.html for the 150 random samples, the best matching units array just give me the a matrix of shape (150,2) , but no ClusterID, it is more like a coordinate for 150 samples in 2-D space. So how can I find the ClusterID for original 150 samples, thank you.

deepwindlee commented 5 years ago

请问我要怎么知道样本聚类后所属的具体种类呢

Sitin commented 3 years ago

Hi, @isaac-you, you can use best matching units as suggested in documentation.

bmus = som.get_bmus(som.get_surface_state(X))
cluster_labels = [som.clusters[bmu[0]][bmu[1]] for bmu in bmus]

However, I am still wondering why there is no such method in the library itself given that it already have clustering support.