Closed euhruska closed 6 years ago
The easiest option to speed-up k-means is of course by increasing the stride, i.e. by using less data for the clustering. Mapping down to less dimensions before should help, too, or taking less cluster centers.
Alternatively, you can try pyemma.coordinates.cluster_mini_batch_kmeans()
(http://www.emma-project.org/latest/api/generated/pyemma.coordinates.cluster_mini_batch_kmeans.html) as an approximation to k-means.
But one should also note that huge data sets require some time for the computation.
Is it possible to parallelize kmeans? couldn't get it to work
Since you set OMP_NUM_THREADS
and PYEMMA_NJOBS
to 1
all parallelizations are switched off. The initialization of the centers (probably being the time consuming part) can not fully be parallelized but to some extent. The actual iteration is parallelized.
works
I sthere a way to speed up kmeans? See: https://github.com/radical-collaboration/extasy-grlsd/issues/66
Using python 2.7.11, pyemma 2.5.4
with ('n atoms', 132) ('n frames total', 4800000) ('n trajs', 900) the kmeans step takes hours: