ubccr / hpc-toolset-tutorial

Tutorial for installing Open XDMoD, OnDemand, & ColdFront
GNU General Public License v3.0
121 stars 72 forks source link

slurmctld never starts #150

Closed nuf0xx closed 1 year ago

nuf0xx commented 1 year ago

after a fresh pull and .hpcts start the slurmctd never starts.

...
frontend   | -- Waiting for slurmctld to become active ...
ondemand   | nc: connect to frontend (172.19.0.9) port 22 (tcp) failed: Connection refused
ondemand   | -- Waiting for frontend ssh to become active ...
cpn02      | -- slurmctld is not available.  Sleeping ...
cpn01      | -- slurmctld is not available.  Sleeping ...
frontend   | -- Waiting for slurmctld to become active ...
...

however, the slurmcltd container is started:

hpc-toolset-tutorial (git)-[master] # docker logs slurmctld
---> Starting SSSD ...
---> Starting the MUNGE Authentication service (munged) ...
---> Starting sshd on the slurmctld...
---> Waiting for slurmdbd to become active before starting slurmctld ...
-- slurmdbd is not available.  Sleeping ...
(2023-03-01 20:07:42): [sssd] [server_setup] (0x1f7c0): Starting with debug level = 0x0070
(2023-03-01 20:07:42): [be[implicit_files]] [server_setup] (0x1f7c0): Starting with debug level = 0x0070
(2023-03-01 20:07:43): [be[default]] [server_setup] (0x1f7c0): Starting with debug level = 0x0070
(2023-03-01 20:07:43): [pam] [server_setup] (0x1f7c0): Starting with debug level = 0x0070
(2023-03-01 20:07:43): [nss] [server_setup] (0x1f7c0): Starting with debug level = 0x0070
-- slurmdbd is not available.  Sleeping ...
-- slurmdbd is not available.  Sleeping ...
-- slurmdbd is not available.  Sleeping ...
-- slurmdbd is not available.  Sleeping ...
-- slurmdbd is now active ...
---> Starting the Slurm Controller Daemon (slurmctld) ...
dsajdak commented 1 year ago

@nuf0xx Sorry for the delay in responding. We were working on updating the containers for our next presentation. The latest containers have been tested and published. I recommend you do the following:

./hpcts destroy
git pull
./hpcts start

If this doesn't work you can try starting fresh with:

./hpcts cleanup
./hpcts start