xtreme-d / docker-slurm-cluster

Simple Slurm cluster in docker.
MIT License
9 stars 6 forks source link

[SOLVED] Slurmctld slurmd and slurmdbd STATUS FATAL #11

Closed m0rfeo closed 2 years ago

m0rfeo commented 2 years ago

Hi, I try the new version for the cluster and status for slurm processes are FATAL on alll nodes, sure some parameters on Slurm conf change from 20.02.7. Instead init process run infinetly

╭─m0rfeo@matrix ~/test/docker-slurm-cluster ‹master› ╰─➤ docker exec axc-headnode supervisorctl status munged RUNNING pid 37, uptime 0:29:04 slurmctld FATAL Exited too quickly (process log may have details) slurmd FATAL Exited too quickly (process log may have details) slurmdbd FATAL Exited too quickly (process log may have details) slurmdbd_init__oneshot RUNNING pid 893, uptime 0:00:05

m0rfeo commented 2 years ago

This is the problem, at least the most relevant

On logs there are some errors, firts AccounthingStoreJobComment have to be changed on slurm.conf: INFO exited: slurmctld (exit status 1; not expected) axc-headnode | slurmctld: fatal: The AccountingStoreJobComment option has been removed, please use AccountingStoreFlags=job_comment option instead.

hackprime commented 2 years ago

Thanks for your comment! Actually, I found a number of issues that need to be fixed to fit Slurm 21.x. I have already pushed new changes

m0rfeo commented 2 years ago

@hackprime Thanks! I'm testing new version now, all is working propertly at the moment