Clustering with the CNTK?

microsoft / CNTK

Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit

https://docs.microsoft.com/cognitive-toolkit/

Other

17.51k stars 4.29k forks source link

Clustering with the CNTK? #2743

Closed PascalS86 closed 6 years ago

PascalS86 commented 6 years ago

Hi everybody? are there plans for clustering algorithm within the cntk? An implementation of (e.g.) k-means like in Accord-Framework or in the azure methods in Cognitive Services, would be awesome.

rhy-ama commented 6 years ago

You can approximate it using feature learning and then fuzzy bin the output. Alternatively, it would probably require implementation of additional nodes for the compute graph.

Maybe better direction is to focus on SOM (wiki:Self-Organising Maps) - which is a generalization of clustering algo family.

Couple of links:

tensorflow implementation
one interesting paper: Principal temporal extensions of SOM

ArchiDevil commented 6 years ago

You can use something like autoencoder with any clustering algorithm from scikit. It is already on your machine if you installed Anaconda.

PascalS86 commented 6 years ago

Hey, thanks for your response. It's not about orchestrating different frameworks. I know, in python I can use scikit or tensorflow to achieve this. And in C# I can use Accord.NET to do this. I just think, that this is missing in cntk. So there are no plans to add this feature to CNTK?

n17s commented 6 years ago

The kmeans objective can be written down as a network and you can then use your favorite learner:

x = C.input_variable((1,dimension)
c = C.parameter((dimension, num_clusters))
loss = C.reduce_min(C.reduce_sum(c*c, axis=0) - 2 * C.times(x, c))

which basically says that the loss for a point and a bunch of centroids is the minimum of distances from all centroids. For the distance I have used ||x-c||^2 = ||c||^2 - 2*dot(x,c) + constant.

For inference you can use argmin instead of reduce_min.