Open JulianGro opened 2 months ago
Hi @JulianGro, thanks for filing the issue. I will take a look why we are not retrying more.
Are you running it as a service? https://github.com/testflows/TestFlows-GitHub-Hetzner-Runners/blob/main/testflows/github/hetzner/runners/service.py#L112 should always restart the service if it fails.
Probably. I just use github-hetzner-runners
to set everything up.
github-hetzner-runners -c config.yaml cloud redeploy
Here is my config:
root@Hetzner-Runner-Scaler:~# cat config.yaml
config:
github_token: REDACTED
github_repository: overte-org/overte
hetzner_token: REDACTED
ssh_key: "~/.ssh/id_rsa.pub"
max_runners: 30
recycle: true
with_label:
- "self_hosted"
default_image: "x86:system:ubuntu-22.04"
default_server_type: cx22
# Server for deploying Runners
cloud:
server_name: "GitHub-Runner-Deployer"
deploy:
server_type: cx22
image: "x86:system:ubuntu-22.04"
#location:
#setup_script:
root@Hetzner-Runner-Scaler:~#
It didn't exit though, so I don't think it would be restarted. It was still running; It didn't crash.
Apparently, when GitHub returns something unexpected, the software will give up after just a couple of retries. Here is a part of a log of it happening: https://bin.linux.pizza/?176246c3d091f2e1#Fv39vgCY2S5C2wt4Zu6HV7wSd2CLqE1CaoCqGmgQWNHA If you look at the log, you will notice that it only retried for a couple of seconds, before giving up forever. (It is August 31st and there hasn't been any new log messages since August 14th.)
While the error page is pretty full of crap, it does include: