Closed Yizhaofeng closed 4 months ago
Sorry for the slow response!
Yep, your interpretation is correct! In general the parallelization is more efficient with threading (cpus-per-task) than MPI (ntasks), so having cpus-per-task equal to the total number of cores on the physical compute node is usually the most efficient usage.
Hello, I am here for another question agian! For example, if I have Ntime = 30 (daily resolution) and Ndepth =1. The submit script in Slurm system as follows:
!/bin/bash
SBATCH --output=sim-%j.out
SBATCH --error=sim-%j.err
SBATCH --ntasks=5
SBATCH --cpus-per-task=64
mpirun -n ${SLURM_NTASKS} ./coarse_grain.x \ --Nprocs_in_time "5"\
Does this mean that I use 5 processes with 64 cores per process to compute, and one process represents a day of computation, there are 5 processes simultaneously starting 5 days of computation, 30/5=6 simultaneous calculations of 5 processes are required until the complete execution is complete, and each day of computation uses 64 cores to compute?