stackhpc / ansible-role-openhpc

Ansible role for OpenHPC
Apache License 2.0
45 stars 15 forks source link

Add support for autoscaling #120

Closed sjpb closed 2 years ago

sjpb commented 2 years ago

Most of the autoscaling setup can be done either before running runtime.yml, or using the openhpc_config variable to pass in additional slurm.conf parameters.

This PR adds an option extra_nodes to the group/partition definitions in openhpc_partitions to allow additional nodes definitions to be added into the slurm.conf node/partition definitions. As well as autoscaling/state=CLOUD nodes these could actually be used to add non-role-controlled normal nodes into a cluster using this role.

It also modifies the docs, as I realised they were a bit messy/confusing in places.

There are some subtleties which needed changes to the slurm.conf templating:

sjpb commented 2 years ago

Note test13 for openhpc_config was not actually getting run, and verification for it needed fixing. Done in this PR as we also need openhpc_config for autoscaling.

sjpb commented 2 years ago

centos:8.2.2004, test4 failed in CI but worked ok on local molecule. Rerunning ...

sjpb commented 2 years ago

Passed on 2nd attempt, ready for re-review @jovial.

sjpb commented 2 years ago

Passed tests on 2nd attempt

sjpb commented 2 years ago

@jovial can you rereview please? Think this is ready to go.