run user-defined functionality in `job_started` and `job_completed` hooks

We would like the ability to run some additional commands during the job_started and job_completed hooks. Currently they just run the commands defined in the startup.sh script which would require rebuilding the runner-manager image for any changes.

I am happy to implement this myself, just wanted to get opinions as there isn't a clear "best" implementation for me to work on, hence the issue :)

Use Case

Logging into to a private ECR repository when runners start a job.

Example

function job_started {
    ...

        aws ecr get-login-password | docker login

    echo "Done"
}

Possible Solutions

Config Option

An additional config option (in the runner-manager.yaml) which would take in a string of additional commands and run them after the default commands but before the echo "Done".

name: runner-manager
runner_groups:
- name: runner_group
  job_started: |-
    echo "Doing something"
    aws ecr get-login-password | docker login
    echo "Doing something else"
  backend:
    name: aws
    config:
      region: eu-west-1
    instance_config:
      image: ami-123456789

Limitations

Does not allow modifications to be made to the existing commands in the hooks or run commands before/in between the existing set.

Mount `job_started`/`job_completed` scripts

The job_started and job_completed functions could be pulled out of startup.sh as their own scripts so that they can be mounted, similar to how the runner-manager.yaml config file is, allowing for full control over these hooks.

Mount `startup.sh` script

Similar to "Mount job_started/job_completed scripts" but mount the entire startup.sh script for full control over what is happening during the startup of the node.

That would be a great addition indeed! In terms of structure, I like the fact of having it defined at the runner group level as you did, but I also think that it may come as a bit cumbersome if you have to rewrite the same job_started/job_completed parameter for every runner group.

Not a deal breaker but something to think about. I'm actually more curious to know if you have different kind of scripts to run depending on the runner group. And if it becomes an issue we can always implement a global settings at the root of the config and one that will take priority at the group level.

Now, I know I didn't do a great job in terms of data structure of all the models and how the startup script itself is "templated". But I believe it is possible to achieve what you want without having to rewrite how everything is setup.

If you have the bandwidth to work on this, I will gladly take this change in, and as per the values that are currently defined, we can remove them and I can implement them on the configuration side on my end. (Or we can set them as default value, whatever we think it's best).

scality / runner-manager