rackslab / Slurm-web

Open source web dashboard for Slurm HPC clusters
https://slurm-web.com
GNU General Public License v3.0
340 stars 97 forks source link

slurm-web-agent-uwsgi not communicating with slurmrestd #380

Closed rseaman2016 closed 1 week ago

rseaman2016 commented 1 week ago

Hey there! I am running into an issue where when navigating to slurm-web, the cluster list is empty:

Screenshot 2024-11-08 at 7 29 29 AM

Upon further investigation, slurmrestd does not seem to be receiving requests from slurm-web-agent-uwsgi, but can receive requests when running the slurmrestd troubleshooting steps as the slurm user

When run locally as slurm user:

slurmrestd[806485]: operations_router: [/run/slurmrestd/slurmrestd.socket->socket:[26252175] (fd 8)] GET /slurmdb/v0.0.39/config
slurmrestd[806485]: rest_auth/local: _auth_socket: _auth_socket: [/run/slurmrestd/slurmrestd.socket->socket:[26252175] (fd 8)] accepted user socket connection with uid:202 gid:202 pid:806573
slurmrestd[806485]: slurmrestd: operations_router: [/run/slurmrestd/slurmrestd.socket->socket:[26258018] (fd 8)] GET /slurmdb/v0.0.39/config
slurmrestd[806485]: operations_router: [/run/slurmrestd/slurmrestd.socket->socket:[26258018] (fd 8)] GET /slurmdb/v0.0.39/config

Drop-in file /etc/systemd/system/slurmrestd.service.d/slurm-web.conf:

[Service]
# Unset vendor unit ExecStart to avoid cumulative definition
ExecStart=
Environment=
# Disable slurm user security check
Environment=SLURMRESTD_SECURITY=disable_user_check
ExecStart=/usr/sbin/slurmrestd $SLURMRESTD_OPTIONS unix:/run/slurmrestd/slurmrestd.socket
RuntimeDirectory=slurmrestd
RuntimeDirectoryMode=0755
User=slurm
Group=slurm

Service file for slurm-web-agent:

[Unit]
Description=uWSGI instance for Slurm-web agent
After=network.target

[Service]
User=slurm
Group=slurm
RuntimeDirectory=slurm-web-agent
ExecStart=/usr/sbin/uwsgi --ini /usr/share/slurm-web/wsgi/agent/slurm-web-agent.ini

[Install]
WantedBy=multi-user.target

I am running slurm-web-agent-3.2.0-1.el9.noarch and slurm-ohpc-slurmrestd-23.11.6-310.ohpc.5.1.x86_64. Any pointers would be appreciated.

Thanks!