stackhpc / ansible-role-openhpc

Ansible role for OpenHPC
Apache License 2.0
47 stars 17 forks source link

Enable task affinity in slurm.conf #42

Open sjpb opened 4 years ago

sjpb commented 4 years ago

Currently there is no task launch plugin configured, which means srun's --cpu-bind option does not work.

See guidance under TaskPlugin on the slurm.conf manpage:

NOTE: It is recommended to stack task/affinity,task/cgroup together when configuring TaskPlugin, and setting TaskAffinity=no and ConstrainCores=yes in cgroup.conf. This setup uses the task/affinity plugin for setting the affinity of the tasks (which is better and different than task/cgroup) and uses the task/cgroup plugin to fence tasks into the specified resources, thus combining the best of both pieces.

houyushan commented 2 years ago

The task launch plugin is configured: slurm.conf : "TaskPlugin=task/affinity,task/cgroup", cgroup.conf : "TaskAffinity=no ConstrainCores=yes" but srun's --cpu-bind option does not work, and the job runs with only one CPU core。

info: srun --cpu-bind=socket mpiexec -n 6 -genv I_MPI_DEBUG=4 /home/bt-mz.C.x

[0] MPI startup(): Rank Pid Node name Pin cpu [0] MPI startup(): 0 597305 c2 {0} [0] MPI startup(): 1 597306 c2 {0} [0] MPI startup(): 2 597307 c2 {0} [0] MPI startup(): 3 597308 c2 {0} [0] MPI startup(): 4 597309 c2 {0} [0] MPI startup(): 5 597310 c2 {0}

Has anyone ever encountered a similar problem, or what is the cause of this problem?