Ensure that the slurmdbd service is accessible on its specified port after a restart before restarting any other services that
might depend on slurmdbd being accessible.
This fixes an issue where slurmctld is raised before slurmdbd is responding on its port, causing systemctl restart slurmctld to fail with the following message to syslog:
Mar 3 17:02:58 matt-slurm-control-0 slurmctld[63748]: accounting_storage/slurmdbd: clusteracct_storage_p_register_ctld: Registering slurmctld at port 6817 with slurmdbd
Mar 3 17:02:58 matt-slurm-control-0 slurmctld[63748]: error: Sending PersistInit msg: Connection refused
Mar 3 17:02:58 matt-slurm-control-0 slurmctld[63748]: fatal: You are running with a database but for some reason we have no TRES from it. This should only happen if the database is down and you don't have any state files.
Mar 3 17:02:58 matt-slurm-control-0 systemd[1]: slurmctld.service: Main process exited, code=exited, status=1/FAILURE
Mar 3 17:02:58 matt-slurm-control-0 systemd[1]: slurmctld.service: Failed with result 'exit-code'.
This wait_for approach is already taken when restarting the slurmctld daemon.
Ensure that the slurmdbd service is accessible on its specified port after a restart before restarting any other services that might depend on slurmdbd being accessible.
This fixes an issue where
slurmctld
is raised beforeslurmdbd
is responding on its port, causingsystemctl restart slurmctld
to fail with the following message to syslog:This
wait_for
approach is already taken when restarting theslurmctld
daemon.