stackhpc / ansible-slurm-appliance

A Slurm-based HPC workload management environment, driven by Ansible.
51 stars 26 forks source link

Ensure podman copes with a hard reboot #460

Closed sjpb closed 1 month ago

sjpb commented 1 month ago

This PR fixes podman containers failing to start after a hard reboot, with an error "invalid internal status" in the journal for the unit.

Note that originally the podman role ensured the podman temporary directories were on a tmpfs; this appeared to be problematic with later podman and #351 removed this.

Fixes #459.

sjpb commented 1 month ago

Tested on RL9: without this PR hard reboot fails with described problem, with it problem is fixed.