Closed manopapad closed 10 months ago
Apparently this is expected https://slurm.schedmd.com/sbatch.html#OPT_SLURM_TASKS_PER_NODE:
SLURM_TASKS_PER_NODE
Number of tasks to be initiated on each node. Values are comma separated and in the same order as SLURM_JOB_NODELIST. If two or more consecutive nodes are to have the same task count, that count is followed by "(x#)" where "#" is the repetition count. For example, "SLURM_TASKS_PER_NODE=2(x3),1" indicates that the first three nodes will each execute two tasks and the fourth node will execute one task.
So in our case, where the launch is symmetric (all nodes execute the same number of ranks/tasks), we should expect it to have the value "1" if using one node or "1(xN)" if using N>1 nodes.
Merging for now, we can handle this properly on a follow-up PR.
We're seeing it reported e.g. as "1(x2)"