radical-collaboration / extasy-grlsd

Repository to hold the input data and scripts for the ExTASY gromacs-lsdmap work
1 stars 1 forks source link

kmeans takes too long #66

Open euhruska opened 6 years ago

euhruska commented 6 years ago

timing data for analyse unit (run-tica-msm.py)

('time tica finished', '1151.17602015')
('time kmeans finished', '8621.65739107')
('time msm finished', '9160.20796895')
('time frame selection finished', '9189.05303311')
('time writing new frames finished', '9208.67641306')
('time plotting finished', '9623.44697213')

launched with

export PYEMMA_NJOBS=1
export OMP_NUM_THREADS=1 
/opt/xalt/0.7.6/sles11.3/bin/aprun -n 1 -N 1 -L 18544 -d 1 -cc 0 python "run-tica-msm.py" "--path" "/u/sciteam/hruska/scratch/extasy-tica" "--n_select" "100" "--cur_iter" "8" "--Kconfig" "settings_ala12_gpu_tica.wcfg" > "analyse.log"

The line cl = pyemma.coordinates.cluster_kmeans(data=y, k=msm_states, max_iter=10, stride=msm_stride) takes forever when having multiple iterations to analyse.

('n atoms', 132)
('n frames total', 4800000)
('n trajs', 900)

path /u/sciteam/hruska/scratch/radical.pilot.sandbox/rp.session.leonardo.rice.edu.eh22.017756.0001/

euhruska commented 6 years ago

tried

cl = pyemma.coordinates.cluster_mini_batch_kmeans(data=y, k=msm_states, max_iter=10, n_jobs=None)

didn't help

euhruska commented 6 years ago

trying http://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html or https://github.com/radical-cybertools/midas/blob/master/k-means