stackhpc / ansible-slurm-appliance

A Slurm-based HPC workload management environment, driven by Ansible.
36 stars 15 forks source link

Add support for RockyLinux9 #353

Closed sjpb closed 3 months ago

sjpb commented 5 months ago

Make the appliance compatible with RockyLinux 9-based images.

Note that the CI and CaaS environments will continue to use RL8 at present. CI is only carried out using RL9 if a PR branch name starts with rl9 or RL9 is selected when running CI workflows manually.

Additional notes:

Replaces #323

sjpb commented 5 months ago

FAILED Fat image build: https://github.com/stackhpc/ansible-slurm-appliance/actions/runs/7643335283

image

Edit: had wrong image type in glance

sjpb commented 5 months ago

Tests at 0982f41 on a "local" cluster:

ondemand exporter:

[rocky@rl9-login-0 ~]$ systemctl status ondemand_exporter.service
Jan 24 16:26:24 rl9-login-0.rl9.invalid ondemand_exporter[36589]: ts=2024-01-24T16:26:24.868Z caller=collector.go:171 level=error msg="Error collecting apache information" err="Get \"http://localhost:81/server-status\":>

[rocky@rl9-login-0 ~]$ cat /usr/lib/systemd/system/ondemand_exporter.service
Environment="APACHE_STATUS_URL=http://localhost:81/server-status"

[rocky@rl9-login-0 ~]$ curl localhost:9301/metrics
# shows this is working at least
sjpb commented 5 months ago

Fat image build: https://github.com/stackhpc/ansible-slurm-appliance/actions/runs/7653662893

Edit: currently failing due to CVMFS repo 503-ing Edit: repo appears up, retrying

sjpb commented 5 months ago

Checked locally that e5608d9 works on both a) a cluster with existing non-system users b) fresh image

sjpb commented 4 months ago

NB: Currently CI doens't get past the os-manila-mount install task b/c the rpm-reef URL at https://download.ceph.com/ has been broken/renamed.

Edit: see https://tracker.ceph.com/issues/64718

sjpb commented 4 months ago

Repos fixed, lets try again

sjpb commented 4 months ago

Fat image build: https://github.com/stackhpc/ansible-slurm-appliance/actions/runs/8186093673

sjpb commented 3 months ago

Rebuilding fat image: https://github.com/stackhpc/ansible-slurm-appliance/actions/runs/8263123087

Built

sjpb commented 3 months ago

Tests at 43d43f2 on "local" cluster: