Closed foragerr closed 1 year ago
Quick answers
Scale down is indeed turning down instances brute force, but only instances that are not active will be turned down. When running ephemeral this should in general not be needed. But is also not causing problems. Termination for ephemral irunners is done once the job is completed by the instance it self. Hooking in on other events in complex, since we not control which runner got which job.
Since HTTP is not reliable there is no gaurantee. But all messages that are arrived and accepted are put on a queue. The scale-up lambda try to process messages. In case an error happends and the happen is in a category from which it can auto recover such as API rate limit the messages is sent back to the queue. This all in certain max time windows. Most of the stuff is configurable.
Besides that you can setup a pool, in that case ephermal instances will be created based on a cron expression. We creat so now and then a few via cron to ensure all jobs got processed.
Thank you for the detailed response!
This project looks great - especially the upcoming 2.0 changes! Kudos maintainers!
I'm currently running self-hosted runners through an ASG that scales up and down via cron triggers. It isn't exactly "auto-scaled" but meets our needs at the moment. I'm looking to move to a purely ephemeral setup, where
I have a couple of questions:
The docs (both main and v2.0.0 tag) say
Is this still true in 2.0? Is there any desire to implement termination based on
workflow_job
completed
event?I really appreciate your attention!