markovmodel / PyEMMA

🚂 Python API for Emma's Markov Model Algorithms 🚂
http://pyemma.org
GNU Lesser General Public License v3.0
311 stars 119 forks source link

K-means mini batch clustering very slow #1370

Closed and-tos closed 6 years ago

and-tos commented 6 years ago

Hi all, until now I was running pyemma on the front node of our groups cluster. Due to the high memory demand I moved it to a compute node. Now the k-means mini batch clustering is taking forever.

A data set that took a few hours on the front node now takes an estimated 120 hs. Is there a way to identify the root cause of this?

Thanks a lot for your help!

marscher commented 6 years ago

Suspicion: your cluster uses some resource manager like slurm, allowing shared allocation. If you do not specify the number of threads by the n_jobs parameter, it will be determined automatically. This will just use as many threads as there are cores in the machine. But if you specified less cores, then most of these threads will just starve waiting for resources.

marscher commented 6 years ago

If this does not cure it, please set the log level to DEBUG in $HOME/.pyemma/logging.yml and attach the log file + output of the script.

and-tos commented 6 years ago

Yes, setting n_jobs=1 did the trick! Thanks a lot!

marscher commented 6 years ago

to shorten the computation you should use multiple jobs, but fix it to the number of cores allocated by the resource manager.