Closed PFSWcas closed 7 years ago
@PFSWcas TensorFlow differs from CNTK in many aspects - CNTK is not (strictly speaking) dataflow graph computation platform, but rather a distributed DNN/DL platform built on top of dataflow graph computation engine. Both are integrated with Python where K-means is included in standard libraries. Of course you could run K-means on Hadoop too, but it all depends on what you want to do and your constraints.
What is the dimensionality of your inputs? I am not sure if CNTK has node functions to do K-Means efficiently, such as matrix inversion (#1682), trace(), diagonal() [cc @cha-zhang]. The objective function in matrix form is here -> K-means matrix form - stackexchange. In theory it could be done without that but looks involved (as in time consuming). K-means is subset of SOM (self-organizing maps / Kohonen network) in 1D and N1 norm - there must be some examples from similar DNN toolkits that show the network architecture and objective function.
One way to use CNTK in solving (unsupervised) clustering problems would be to train autoencoder (or stacked autoencoder) to reduce the dimensionality of your inputs and then use a similarity metric on the resulting vector (output of the autoencoder) followed by your favorite clustering algorithm (outside of CNTK - for instance in Python - locally or with MPI, distributed with Hadoop(Flink) or via TensorFlow).
Close. @PFSWcas may re-open if there are further update.
I see a TensorFlow K-means clustering algorithm in https://esciencegroup.com/2016/01/05/an-encounter-with-googles-tensorflow/. The K-means process with Tensorflow is clear. However when I try to write a CNTK K-means clustering algorithm, I find it is a little difficult. Did anyone write a CNTK K-means clustering algorithm and give me some hints?