Self-hosted runners take up to 2 mins to boot

ipdxco / custom-github-runners

Customizable Self-hosted GitHub Runners

6 stars 1 forks source link

Self-hosted runners take up to 2 mins to boot #28

Closed galargh closed 1 year ago

galargh commented 1 year ago

We should investigate if we could decrease the boot up time

BigLep commented 1 year ago

@galargh : not disagreeing that faster is better. How does this boot time compare to what happens with non self-hosted runners? I'm trying to get a sense of why to prioritize this now.

galargh commented 1 year ago

With hosted, the runners are available within seconds. With self-hosted, it's up to 2 minutes. This is OK for long workflows where the speedup is much greater than the boot penalty but it becomes an issue when we want to migrate shorter workflows (which is the case in libp2p/quic-go).

galargh commented 1 year ago

I looked at a couple of instances at random and it looks like machine provisioning (from instance up to job started) is under 30s which is pretty decent. I upped instance count limits cause I noticed in the metrics that we quite often operate over (note to self: we need multi-select on org in the monitoring dashboard). I'm gonna have a look at job queued - lambda triggered and lambda triggered - instance up intervals. It'd be nice to have continuous insights into these but it'd make it a way bigger task.

galargh commented 1 year ago

Job created to lambda scaling up in a normal case takes < 5s, and instance requested to init starting < 10s. Some ideas on why we might be hitting degraded performance (compared to these numbers):

self-hosted runners stealing
GitHub issues with sending out webhooks
AWS stalling on instance type issuing

For now, I'll leave it at there is no obvious place that requires optimisation + we upped overall runner type limits + we need to set up continuous monitoring for the self-hosted runner lifecycle (from job requested to runner deprovisioned).