testflows / TestFlows-GitHub-Hetzner-Runners

Autoscaling Self-Hosted GitHub Actions Runners on Hetzner Cloud.
https://testflows.com
Other
34 stars 3 forks source link

Recover from GitHub API errors #23

Open JulianGro opened 2 months ago

JulianGro commented 2 months ago

Apparently, when GitHub returns something unexpected, the software will give up after just a couple of retries. Here is a part of a log of it happening: https://bin.linux.pizza/?176246c3d091f2e1#Fv39vgCY2S5C2wt4Zu6HV7wSd2CLqE1CaoCqGmgQWNHA If you look at the log, you will notice that it only retried for a couple of seconds, before giving up forever. (It is August 31st and there hasn't been any new log messages since August 14th.)

While the error page is pretty full of crap, it does include:

<strong>No server is currently available to service your request.</strong></p>\r\n      <p>Sorry about that. Please try refreshing and contact us if the problem persists.</p>
vzakaznikov commented 2 months ago

Hi @JulianGro, thanks for filing the issue. I will take a look why we are not retrying more.

vzakaznikov commented 2 months ago

Are you running it as a service? https://github.com/testflows/TestFlows-GitHub-Hetzner-Runners/blob/main/testflows/github/hetzner/runners/service.py#L112 should always restart the service if it fails.

JulianGro commented 2 months ago

Probably. I just use github-hetzner-runners to set everything up.

github-hetzner-runners -c config.yaml cloud redeploy

Here is my config:

root@Hetzner-Runner-Scaler:~# cat config.yaml 
config:
    github_token: REDACTED
    github_repository: overte-org/overte
    hetzner_token: REDACTED
    ssh_key: "~/.ssh/id_rsa.pub"
    max_runners: 30
    recycle: true
    with_label:
      - "self_hosted"
    default_image: "x86:system:ubuntu-22.04"
    default_server_type: cx22
    # Server for deploying Runners
    cloud:
        server_name: "GitHub-Runner-Deployer"
        deploy:
            server_type: cx22
            image: "x86:system:ubuntu-22.04"
            #location:
            #setup_script:

root@Hetzner-Runner-Scaler:~#

It didn't exit though, so I don't think it would be restarted. It was still running; It didn't crash.