uabrc / devops-docs

https://docs.rc.uab.edu/devops-docs/
Apache License 2.0
1 stars 5 forks source link

Info on `sdiag` #17

Open wwarriner opened 1 year ago

wwarriner commented 1 year ago

The built-in Slurm command sdiag is useful for an overview of the Slurm control daemon (slurmctld). It provides cumulative statistics on the state of slurmctld since it was last reset. Values are reset automatically at midnight UTC daily, and can be --reset explicitly.

The command sdiag | head -n 18 gives a summary of jobs on the cluster and in the queue.

The block Main schedule statistics shows info on scheduling cycles.

The block Backfilling stats may be more useful when we start using backfill queues.

The next blocks are about remote procedure calls and may be useful for understanding how Slurm is being used in general. It may prove useful in investigating inefficiencies from researchers unfamiliar with how Slurm works and its limitations. An example might be identifying researchers making large numbers of remote procedure calls with, e.g., squeue in a loop.