stackhpc / ansible-role-openhpc

Ansible role for OpenHPC
Apache License 2.0
44 stars 15 forks source link

Allow multiple empty partitions #156

Closed sjpb closed 1 year ago

sjpb commented 1 year ago

Allows multiple partitions defined in openhpc_slurm_partitions to have no nodes in the appropriate group.

Currently for this case slurm.conf ends up with

NodeName=n/a

for each empty partition, which fails for multiple such partitions:

control slurmctld[102929]: fatal: Duplicated NodeHostName n/a in config file

The fix is to not define that node (NodeName) line for each partition, and generate a partition definition like:

PartitionName=whatever ... Nodes=""

Note this isn't quite as per docs but appears a plausible docs typo - see prior discussion here.

Note a NodeName line is still needed for the case where ALL partitions are empty, which was in the molecule tests.

sjpb commented 1 year ago

Hmm, doesn't work as-is for test6 which has NO compute nodes defined at all:

May 11 13:33:13 testohpc-login-0 slurmctld[15082]: error: read_slurm_conf: no nodes configured.
May 11 13:33:13 testohpc-login-0 slurmctld[15082]: fatal: read_slurm_conf reading /etc/slurm/slurm.conf: Invalid argument