rackslab / Slurm-web

Open source web dashboard for Slurm HPC clusters
https://slurm-web.com
GNU General Public License v3.0
292 stars 86 forks source link

A working slurm-web v2.4.0 on Ubuntu 20.04 announcement #222

Closed daverona closed 2 weeks ago

daverona commented 2 years ago

This is not an issue but an announcement for working slurm-web v2.4.0 on Ubuntu 20.04 If this post causes any problem, I will gladly delete it. But I thought I might share what I experienced.

The Docker image can be pulled:

docker login registry.gitlab.com --username <your-gitlab-account-name>
docker image pull registry.gitlab.com/daverona/docker/slurm-web:v2.4.0-p3
docker logout

The package versions used in the image:

Note that these packages are available as .deb files under /tmp directory in the image if you like to have them.

Then you need a configuration directory first from https://github.com/edf-hpc/slurm-web/tree/v2.4.0/conf And please edit the files in the directory especially:

Note. If you don't do this, you will get an empty page.

Assuming that

  1. Your slurmctld is 19.05.05 (default, off-the-shelf package) on Ubuntu 20.04,
  2. slurmctld is running on the host on which you are about to run a container, and
  3. The configuration directory conf is under the current directory,

type the following to run a container:

docker container run \
  -d \
  -v /etc/passwd:/etc/passwd:ro \
  -v /etc/group:/etc/group:ro \
  -v /etc/munge:/etc/munge \
  -v /etc/slurm-llnl:/etc/slurm-llnl:ro \
  -v $(pwd)/conf:/etc/slurm-web:ro \
  -p 8080:80 \
  registry.gitlab.com/daverona/docker/slurm-web:v2.4.0-p3

Note that since slurm-web runs in Docker container, your slurm.conf should not use host address like localhost or loopback addresses.

Then visit http://localhost:8080

Note. If you plan to run the container on a different machine from the one running slurmctld, copy four files:

to the machine you will run the container and mount them. It will work. (It's been tested.)

By editing ./dashboard/clusters.config.json and ./restapi.conf properly, you can observe multiple clusters on a single website. (It's been tested.)

The Dockerfile source is here: https://gitlab.com/daverona/docker/slurm-web/-/blob/v2.4.0-p3/Dockerfile

Note that pyslurm and slurm-web have patches in Dockerfile.

I hope this post will help someone who wishes to run slurm-web on Ubuntu 20.04. And want to say thank you so much for this fabulous work!!

Zorlin commented 2 years ago

Fantastic work! I might try and get this working on Slurm 21.08 if I can manage it.

rezib commented 2 years ago

Thank you very much for this sharing and the positive feedback! :)

daverona commented 2 years ago

Thank you very much for this sharing and the positive feedback! :)

No. Thank YOU! 😃

liu-shaobo commented 2 years ago

hi Can I see your ./dashboard/clusters.config.json and ./restapi.con files

daverona commented 2 years ago

hi Can I see your ./dashboard/clusters.config.json and ./restapi.con files

I have three clusters and following is clusters.config.json of slurm.watch.example.com, where I can observe all three of them:

[
  {
    "name": "Cluster A (XXX internet data center)",
    "api": {
      "url": "http://slurm.space1.example.com:8080",
      "path": "/slurm-restapi"
    }
  },
  {
    "name": "Cluster B (Headquarter server room)",
    "api": {
      "url": "http://slurm.watch.example.com",
      "path": "/slurm-restapi"
    }
  },
  {
    "name": "Cluster C (XXX internet data center)",
    "api": {
      "url": "https://slurm.example.com",
      "path": "/slurm-restapi"
    }
  }
]

And the corresponding restapi.conf:

# configure here all parameters for the Slurm-web REST API

[cors]
# give here the list of the domains from where the origin is authorized,
# separated by commas
#
# example : authorized_origins = http://example1.com,http://example2.com
#authorized_origins = http://localhost,http://server.local:8000,http://192.168.90.121:8000
authorized_origins = http://localhost,http://slurm.watch.example.com

[config]
# Set to 'enable' to activate authentication.
# Authentication is based on LDAP, so an LDAP server and the good configuration
# for it, below in this file, are required.
#authentication = enable
authentication = disable
# Absolute path to the secret key file. This key is used to sign tokens
# on server-side and make sure they are not modified during network
# communications. The secret is required only if authentication is enabled.
# Make sure this file is readable by root and WCGI user only, and not
# world-readable.
#secret_key = /etc/slurm-web/secret.key
# Set to 'enable' to activate cache.
cache = enable
# Path for racks description (default to /etc/slurm-web/racks.xml).
racksxml = /etc/slurm-web/racks.xml

[roles]
# Allow guests users or not. Guests users have the 'all' role. Disabled by
# default.
# guests = disabled

# Trusted sources can be enabled to allow specific hosts to access the API and
# bypass the authentication. If enabled, the hosts must be declared in the
# roles members using the % prefix followed by the IP address or the hostname.
# This feature is disabled by default.
# trusted_sources = disabled

# The `all` role if for all users that do not either have the user or admin
# role, including guests. This role can be:
#
# - disabled: users with the all role cannot access the REST API
# - enabled: users with the all role can access the REST API
#
# Enabled by default.
#all = enabled

# The roles members can be defined with a list of groups (whose names
# are prefixed by @) and logins.
user = @chimistes
admin = @admin,pierre

# Give here restricted fields about jobs for each role, separated by commas
restricted_fields_for_all = command
restricted_fields_for_user = command
restricted_fields_for_admin =

[acl]
# You can set here an access control list for each view of the dashboard for
# this cluster.
# This feature requires authentication to be enabled.
# For each view, you can give a list of roles to define who can access to this
# view.
# If no ACL is provided for a view, every authenticated user can access to
# this view.
# i.e. ``jobs = admin`` implies that the jobs view will be available only for
# every user which has role of 'admin'.
# If you give an empty value for an entry (i.e. ``jobs =``), the corresponding
# view will not be accessible for anybody.
#
# Available views are : [
#   'jobs', 'jobsmap', 'partitions', 'reservations', 'qos', 'racks', '3dview',
#   'gantt', 'topology'
# ]

[ldap]
# Configure here settings to connect to your LDAP server.
uri = ldap://admin:389
base_people = ou=people,dc=cluster,dc=local
base_group = ou=group,dc=cluster,dc=local
# Optional path to CA certificate. Required only if ldaps protocol is used and
# CA certification not available in system certificates chain.
# cacert = /etc/ssl/certs/cacert.pem
expiration = 1296000

[cache]
# address for Redis server (default to host=localhost, port=6379)
redis_host = redis
redis_port = 6379
# TTL for jobs (default to 10 seconds)
jobs_expiration = 10
# TTL for other informations (default to 60 seconds)
global_expiration = 60

I don't use roles and ldap sections.

marks221b commented 2 years ago

hi so have anyway run slurm-web on centos 7.7 with docker? when i build over it seem like only white page

daverona commented 2 years ago

hi so have anyway run slurm-web on centos 7.7 with docker? when i build over it seem like only white page

I don't use CentOS. So I cannot answer to your question.

But the important part is to build Docker image with pyslurm working with Slurm installed on your CentOS. pyslurm is picky about Slurm version and you may want to investigate which pyslurm would work with your Slurm first.

andrewssobral commented 1 year ago

Hello @daverona , Thank you for your excellent work. I tried to follow your steps, but I am facing the following issue:

Screen Shot 2022-08-20 at 00 22 59

Do you know what is it?

For info my Slurm version is 19.05.5 and I'm running the container on the slurm controller (headnode). Thanks in advance, Andrews

andrewssobral commented 1 year ago

Related issue https://github.com/edf-hpc/slurm-web/issues/183

daverona commented 1 year ago

Hi @andrewssobral,

Thanks for your kind words.

To answer your question, let me ask you a question: Is your slurm-web (web part) trying to connect to another remote slurm-web (REST part) or just the same local slurm-web (REST part)?

If it is local, I suggest to check your local slurm-web's restapi.conf and make sure that authorized_origins includes non-localhost address. For instance, mine is: http://localhost,http://slurm.watch.example.com.

If it is remote (in this scenarios, you use your local slurm-web container (web part) for the remote slurm-web container (REST part)), check your remote slurm-web's restapi.conf and make sure that authorized_origins includes your local container's reachable server URL, i.e. http://slurm.watch.example.com in the above example. Best way to test this is to ping from remote container to your local container for reachability.

If everything is good, the second question to ask is: Did you register all slurm-web containers (including your local one) in cluster.config.json? For example, here's mine:

[
  {
    "name": "Brain (Hostway internet data center)",
    "api": {
      "url": "http://slurm.space1.example.com:8080",
      "path": "/slurm-restapi"
    }   
  },  
  {
    "name": "Dev (Headquarter server room)",
    "api": {
      "url": "http://slurm.watch.example.com",
      "path": "/slurm-restapi"
    }   
  },  
  {
    "name": "Seer (Hostway internet data center)",
    "api": {
      "url": "https://slurm.example.com",
      "path": "/slurm-restapi"
    }   
  }
]

Note that even thou the second one (Dev) is local to me, I didn't use http://localhost for the service URL.

Hope this helps. Good luck.

hellozzcy commented 1 year ago

I want to know where the URL and path in cluster.config.json are obtained from. I have deployed a local Slurm cluster, but I don't know what to fill in the address in cluster.config.json. Are there any default listening addresses? Please help answer this question. Thank you.

hellozzcy commented 1 year ago

Related issue #183

Have you solved this problem? I encountered the same issue where I'm not sure how to fill in the url and path in the cluster.config.json file. I couldn't find any related configuration in my Slurm cluster. Do you have any idea? Thank you.

BigDataHealthcare commented 5 months ago

Hi,

Where do i need to copy below configuration from the repository

./racks.xml ./restapi.conf

daverona commented 5 months ago

Hi,

Where do i need to copy below configuration from the repository

./racks.xml ./restapi.conf

The answer is $(pwd)/conf (i.e. conf directory under the current directory). In my original post, there's a line: -v $(pwd)/conf:/etc/slurm-web:ro \

$(pwd)/conf directory structure is here: https://github.com/rackslab/slurm-web/tree/v2.4.0/conf You can find where racks.xml and restapi.conf are placed.

Then$(pwd)/conf is bind-mounted to /etc/slurm-web in the container.

Hope this answers to your question.

daverona commented 5 months ago

I want to know where the URL and path in cluster.config.json are obtained from. I have deployed a local Slurm cluster, but I don't know what to fill in the address in cluster.config.json. Are there any default listening addresses? Please help answer this question. Thank you.

Maybe it wasn't clear in my original post. If slurmctld and the container run on the same host, you don't have a cluster. So the following cluster.config.json would be suffice:

[]

If your container is running on a host which is not running slurmctld, then you need to configure cluster.config.json.
To simplify explanation, assume we have following clusters:

cluster-1.example.com
  - slurmctld running
  - slurm-web container running (*)
cluster-2.example.com
  - slurmctld running
  - slurm-web container running (*)
...
observer.example.com
  - may or may not run slurmctld  
  - slurm-web container running (**)

Each server marked with (*) is not (obviously) a cluster itself. So empty array is enough for cluster.config.json. They are however need to run slurm-web container for a cluster observer (marked with (**)). cluster.config.json file in the original post is for this observer (observer.example.com in our assumption).

Now to answer to your question,

Here is an example. If you use https://slurm.cluster-1.example.com for cluster-1.example.com and http://slurm.cluster-2.example.com:8080 for cluster-2.example.com, cluster.config.json would be like this:

[
  {
    "name": "Cluster 1 (XXX internet data center)",
    "api": {
      "url": "https://slurm.cluster-1.example.com",
      "path": "/slurm-restapi"
    }
  },
  {
    "name": "Cluster 2 (YYY internet data center)",
    "api": {
      "url": "http://slurm.cluster-2.example.com:8080",
      "path": "/slurm-restapi"
    }
  }
]

If observer.example.com is not running slurmctld, you should omit the following lines in the original post:

  -v /etc/passwd:/etc/passwd:ro \
  -v /etc/group:/etc/group:ro \
  -v /etc/munge:/etc/munge \
  -v /etc/slurm-llnl:/etc/slurm-llnl:ro \
BigDataHealthcare commented 5 months ago

I am running the container and slurmctld on same host. The host created on virtual machine. I have updated the clusters.config.json with value []. But i am getting following error.

image

I found that the http://localhost:8080/slurm-restapi throwing an"Intrernal Server Error" - status : 500. I am unable to find out the root cause of issue. Can you please share me some information that will help me to resolve issue?

daverona commented 5 months ago

I am running the container and slurmctld on same host. The host created on virtual machine. I have updated the clusters.config.json with value []. But i am getting following error.

image

I found that the http://localhost:8080/slurm-restapi throwing an"Intrernal Server Error" - status : 500. I am unable to find out the root cause of issue. Can you please share me some information that will help me to resolve issue?

The error is really a warning. The real problem is, your slurm-web UI part cannot reach the slurm-web REST part.

Open restapi.conf file and make sure that authorized_origins has an entry: http://localhost:8080.

rezib commented 1 month ago

Note that this issue concerns Slurm-web v2 which is not maintained anymore. The new version v3.0.0 has been released and its is much easier to install than previous versions thanks to native system packages (RPM, deb) for most popular distributions.

The quick start guide for v3.0.0 is available online: https://docs.rackslab.io/slurm-web/install/quickstart.html

If you prefer containers over traditional system packages, it is also planned to work on this in #266.

Unless you see a good reason to keep this issue open, it will be closed in a few weeks. BTW, thank you @daverona for your invaluable work on Slurm-web v2 despite its bad shape.

rezib commented 2 weeks ago

For the reasons explained in the previous comment, I finally close this issue.