Open Crease29 opened 1 month ago
Most likely your runner server type is too small and does not have enought resources? Can you try using a bigger server type? For example, try using cpx31 instead of cpx21.
The runner will power itself off after GitHub Actions runner process (run.sh) exits https://github.com/testflows/TestFlows-GitHub-Hetzner-Runners/blob/main/testflows/github/hetzner/runners/scripts/startup-x64.sh#L13.
However, some messages look strange. We need to check why the used time jumped from 0d0h4m
to 0d0h49m
while it does not match the timestamps of the log.
00:49:28 scale_down INFO š Marking unused runner server github-hetzner-runner-10802907644-29965869241 used 0d0h4m
...
00:49:46 scale_down INFO š Marking unused runner server github-hetzner-runner-10802907644-29965869241 used 0d0h49m
Most likely your runner server type is too small and does not have enought resources? Can you try using a bigger server type?
Hmm, I kinda doubt it. All they're doing is installing some yarn packages and doing a few HTTP requests, and writing something to a JSON file. Nothing very resource intensive. I took a look at the graphs during build time and they were all just fine.
Ok, thanks for checking. Then we might have some race condition. I will check how used time is being calculated.
Ok, thanks for checking. Then we might have some race condition. I will check how used time is being calculated.
Thank you, if I can provide you with any more information that could help to identify the root cause, please let me know.
@Crease29, thanks. It would be helpful if this occurs again, to see the full log including specific job id that misbehaved. Of course, the log should be checked to make sure no private data is revealed.
It would be nice to see that run.sh actually started correctly.
@vzakaznikov I will try my best to catch it. Just to be sure, you mean the log output from github-hetzner-runners cloud log
, right?
@vzakaznikov I will try my best to catch it. Just to be sure, you mean the log output from
github-hetzner-runners cloud log
, right?
Yes, if possible. That would be very useful.
It has been pretty stable today, so I wasn't able to gather any further logs so far.
Hi @Crease29, any more occurances when you've run into the same issue?
Every now and then but not as often anymore as when I reported the issue. Iām currently on holiday so Iām not much at my pc but when Iām back and it happens again I will comment here.
Thanks @Crease29 for the update. One suggestion I can make is to try to increase the following timeouts:
--max-unused-runner-time
sec maximum time after which an unused runner is removed and its server deleted, default: 180 sec--max-runner-registration-time
maximum time after which the server will be deleted if its runner is not registered with GitHub, default: 180 secLooking again at your log messsages, looks like the runner gets created but then it is being marked as "unused" and powered off to be recycled. This can happen either if for some reason GitHub API is not updating runner status https://github.com/testflows/TestFlows-GitHub-Hetzner-Runners/blob/main/testflows/github/hetzner/runners/scale_down.py#L347 or maybe each scale down loop iteration is slow and takes more than max-unused-runner-time
.
I would try to set these timeouts to something like 600
and check if this helps.
@Crease29, could you try https://pypi.org/project/testflows.github.hetzner.runners/1.7.240926.1135125/?
Heya,
I'm using this project for about a week now and it has been working very good for me. Thank you for your work and making this public! Since today I'm getting more and more failed jobs for the following reason:
I haven't touched any of the default values.
Any idea what the issue could be here? It sounds like the runner informed GitHub that it will take over the job but then it shut down?
Logs that I have found for one of these jobs: