Closed npalm closed 3 months ago
Thanks for this interesting option. 1) What do you mean with small fleets? 2) I assume that these rate limits won't apply to GHES, will they? 3) Can this also be used with idle/pooled runners or do they have to be turned off? We have certain problems in the night, where we set the pool to 0, where we would expect that this approach could help us to get more stability.
Description
This feature add the capability to retry scaling a runner when a job is still queued after a defined delay. This feature is added to avoid pool for ephemeral runners.
Implementation
The module is extended with configuration top optional enable one or more retries. Once enabled the scale-up lambda will publish the same message as it recieves extend with a counter on a retry-job-queueu with a delay. A new lambda will pick the message from this queue and checks if the job is still queued (via GitHub API). In case it is still queued it is published again on je the job queue, incoming queue of the scale-up lambda
Consequences
Testing
Testing can be done as follow
Trigger a workflow
Terminate the created instance before the job starts
Wait, after the delay the retry job should publish the message again which triggers a new instance creation.
[x] Multi runners.
[x] Default runners, not enabled requires configuraton update
Tasks