rackslab / Slurm-web

Open source web dashboard for Slurm HPC clusters
https://slurm-web.com
GNU General Public License v3.0
317 stars 89 forks source link

Working container for slurm 22.5 on Ubuntu 22.04 #233

Closed Caian closed 3 months ago

Caian commented 1 year ago

Hello all,

Following the achievement of daverona in porting the docker container to a newer slurm/system (#222), we decided to use their repo and bump the version of Slurm to 22.5, apply the patch #219, and include a docker-compose example.

It builds libslurm and pyslurm from source using Ubuntu 22.04 with the debs generated on Ubuntu 20.

This requires building through build.sh instead of calling docker build directly.

https://github.com/hpg-cepetro/slurm-web-docker-22.5-ubuntu-22.04

The patch #219 fixes several issues, unfortunately the 2D rack view is still not working.

Best regards,

carlos-encs commented 1 year ago

Hello @Caian

I installed your slurm-web-docker, but when I try to open http:/cluster.local:8081/slurm/ a blank page is showed only with a blue arrow on top-left and a circle in the middle spinning.

I use FF console to debug the page and I found the following:

HTTP/1.1 404 NOT FOUND on all these entries:

http://cluster.local:8081/slurm-web-conf/config.json http://cluster.local:8081/slurm-web-conf/clusters.config.json http://cluster.local:8081/slurm-web-conf/2d.colors.config.json http://cluster.local:8081/slurm-web-conf/2d.config.json

Source map error: Error: request failed with status 404 Resource URL: http://cluster.local:8081/javascript/bootstrap/js/bootstrap-tagsinput.min.js

Could you help me out to fix these issues?

Regards

Caian commented 1 year ago

Hi @carlos-encs, how did you start the container? Are you binding /etc/slurm-web to a directory in the host machine?

Before binding volumes, you should make a copy of the default configuration files that are inside the container. You could use the ones from https://github.com/edf-hpc/slurm-web/tree/master/conf, but I'm not sure if they are compatible.

carlos-encs commented 1 year ago

Hi @Caian

This is how I start the container docker run -d --name slurm-web \ -v ./slurm-web2/conf:/etc/slurm-web \ -v /nfs/appdata/serv_slurm/munge/etc/:/etc/munge:ro \ -v /nfs/appdata/serv_slurm/slurm-23.02.0/root/bin/:/etc/slurm-llnl:ro \ -p 8081:80 \ slurm-web

munge and slurm files are in a nfs share. munge auth is working fine.

Sorry for this noob question, but I have to ask: slurmrestd should be configured and running? Could you send me your racks.xml and restapi.conf? I don't understand how to configure those files, properly.

Thanks

Caian commented 1 year ago

@carlos-encs I removed sensitive information from our environment.

[cors]
authorized_origins = http://localhost,http://XXXXXXXX,http://XXXXXXXX

[config]
authentication = disable
cache = enable

[roles]
guests = enabled
user = @chimistes
admin = @admin
restricted_fields_for_all = command
restricted_fields_for_user = command
restricted_fields_for_admin =

[ldap]
uri = ldap://XXXXXXXX
base_people = XXXXXXXX
base_group = XXXXXXXX
reader_dn = XXXXXXXX
reader_password = XXXXXXXX
expiration = 1296000
resolve_job_users = true
cache_job_users = true

[cache]
redis_host = redis
redis_port = 6379
jobs_expiration =
global_expiration =
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE rackmap SYSTEM "/usr/share/slurm-web/restapi/schema/dtd/racks.dtd">
<rackmap>

  <nodetypes>
    <nodetype id="XXXXX" model="XXXXXXXXX" height="2" width="1" />
    <nodetype id="YYYYY" model="YYYYYYYYY" height="1" width="1" />
  </nodetypes>

  <racks posx="0" posy="0" width="2" depth="2">
    <racksrow posx="0">
      <rack id="rack1" posy="0">
        <nodes>
          <node id="AA" type="XXXXX" posx="0" posy="0" />
          <node id="BB" type="XXXXX" posx="0" posy="1" />
          <nodeset id="CC" type="YYYYY" posy="5" />
        </nodes>
      </rack>
    </racksrow>
  </racks>
</rackmap>

No, you don't need slurmrestd working.

nekokani commented 9 months ago

Have you solved your problem? I met the same. :(

BigDataHealthcare commented 8 months ago

I am running the container and slurmctld on same host. The host created on virtual machine. I have updated the clusters.config.json with value []. But i am getting following error.

image

I found that the http://localhost:8080/slurm-restapi throwing an"Intrernal Server Error" - status : 500. I am unable to find out the root cause of issue. Can you please share me some information that will help me to resolve issue?

rezib commented 4 months ago

This issue concerns Slurm-web v2 which is not maintained anymore. You are highly encouraged to test the new version v3.0.0 for which the quick start guide is available online: https://docs.rackslab.io/slurm-web/install/quickstart.html

Note that Slurm-web v3.0.0 is officially supported on Slurm 24.04 LTS with deb packages. For older versions, we plan to distribute containers and this effort is tracked in #266.

Unless someone is motivated to maintain the old version of Slurm-web or you have a justified reason to keep this issue open, it will be closed in a few weeks.

rezib commented 3 months ago

For the reasons explained in the previous comment, I finally close this issue.