stackhpc / ansible-role-openhpc

Ansible role for OpenHPC
Apache License 2.0
45 stars 15 forks source link

Use slurmd's own detection for node definition #96

Open sjpb opened 3 years ago

sjpb commented 3 years ago

Currently the node definitions are constructed using ansible facts. At least in some situations this doesn't appear entirely satisfactory to slurm, e.g. slurmd -C shows ... Boards=1 ... and nodes are getting set DOWN.

This PR runs slurmd -C on all compute nodes, then uses values from the first-in-play in each partition (iaw existing logic) to provide node definitions.

This is sort of Trust On First Use that the node configuration is in fact correct.

An alternative is only to specify NodeName and not the expected CPU parameters at all:

Only the NodeName must be supplied in the configuration file. All other node configuration information is optional.

This would have 2x disadvantages:

Quotes from https://slurm.schedmd.com/slurm.conf.html.