thuem / THUNDER

A particle-filter framework for robust cryoEM 3D reconstruction
GNU General Public License v2.0
56 stars 10 forks source link

THUNDER runs on cpu cluster #12

Open abilijang opened 5 years ago

abilijang commented 5 years ago

Hi, specialist. I was trying to run THUNDER on our cpu cluster, which contains 15 nodes and each node has 20 cores. we use PBS as our Job scheduling system. The submitted job works fine with other programs like relion, however, It came to some issues with THUNDER. Below are our original relion job script and THUNDER script.

relion:

!/bin/bash

Inherit all current environment variables

PBS -V

Job name

PBS -N Class2D/run1

Keep Output and Error

PBS -k eo

Queue name

PBS -q quick

Specify the number of nodes and thread (ppn) for your job.

PBS -l nodes=15:ppn=20

#################################

Switch to the working directory;

cd $PBS_O_WORKDIR

Environment

source ~/.bashrc

NP=wc -l < $PBS_NODEFILE

Run:

echo "starting RELION..." mpirun --bynode -np 300 which relion_refine_mpi --o Class2D/job001/run --i particles.star --dont_combine_weights_via_disc --pool 7 --ctf --iter 30 --tau2_fudge 2 --particle_diameter 420 --K 150 --flatten_solvent --zero_mask --oversampling 1 --psi_step 12 --offset_range 15 --offset_step 2 --norm --scale --j 1 echo "done"

THUNDER

!/bin/bash

Inherit all current environment variables

PBS -V

Job name

PBS -N Class2D/run1

Keep Output and Error

PBS -k eo

Queue name

PBS -q quick

Specify the number of nodes and thread (ppn) for your job.

PBS -l nodes=10:ppn=20

#################################

Switch to the working directory;

cd $PBS_O_WORKDIR

Environment

source ~/.bashrc

NP=wc -l < $PBS_NODEFILE

Run:

echo "starting THUNDER" mpirun --bynode -np 200 thunder_cpu demo.json echo "done"

The THUNDER job works only on the master node, and the error message says it did not recognize the bynode argument, but it works fine with relion job without this message. Does anyone have ideas?

Thanks for your help, Shuangbo

Zarrathustra commented 5 years ago

On cluster, thunder should be run in the way that there is one process on each node. The CPU cores in each node should be used by threading.

Suppose that there are 20 CPU cores on each node. I believe the configuration should be

#PBS -l nodes=10:ppn=1 and mpirun --bynode -np 10 thunder_cpu demo.json

Moreover, please make sure that the value of parameter "Number of Threads Per Process" be 20.