testflows / TestFlows-GitHub-Hetzner-Runners

Autoscaling Self-Hosted GitHub Actions Runners on Hetzner Cloud.
https://testflows.com
Other
34 stars 3 forks source link

Errors when job is picked up by GitHub hosted runner #24

Closed JulianGro closed 2 months ago

JulianGro commented 2 months ago

The software doesn't seem to handle the case of a GitHub hosted Runner picking up a job cleanly. This appears to stall the software until the job is completed, meaning that no Runners are started until the GitHub hosted Runner completes its job.

12:18:25 πŸ€ Finding labels for the job from which WorkflowJob(url="https://api.github.com/repos/overte-org/overte/actions/jobs/29725492779", id=29725492779) stole the runner
12:18:25 ❌ UnknownObjectException: 404 {"message": "Not Found", "documentation_url": "https://docs.github.com/rest/actions/self-hosted-runners#get-a-self-hosted-runner-for-a-repository", "status": "404"}
12:18:25 ❌ UnknownObjectException: 404 {"message": "Not Found", "documentation_url": "https://docs.github.com/rest/actions/self-hosted-runners#get-a-self-hosted-runner-for-a-repository", "status": "404"}
12:18:25 ❗ Error: UnknownObjectException 404 {"message": "Not Found", "documentation_url": "https://docs.github.com/rest/actions/self-hosted-runners#get-a-self-hosted-runner-for-a-repository", "status": "404"}

In our case, we host our own Runners for Linux, but use GitHub's Runners for Windows builds.

vzakaznikov commented 2 months ago

@JulianGro, not sure I understand the issue. If you want to skip some jobs from being monitored then check https://github.com/testflows/TestFlows-GitHub-Hetzner-Runners/wiki/Skipping-Jobs. You can use custom labels for Linux and Windows runners.

JulianGro commented 2 months ago

My understanding is that I am already doing that:

config:
    github_token:REDACTED
    github_repository: overte-org/overte
    hetzner_token: REDACTED
    ssh_key: "~/.ssh/id_rsa.pub"
    max_runners: 30
    recycle: true
    with_label:
      - "self_hosted"
    default_image: "x86:system:ubuntu-22.04"
    default_server_type: cx22
    # Server for deploying Runners
    cloud:
        server_name: "GitHub-Runner-Deployer"
        deploy:
            server_type: cx22
            image: "x86:system:ubuntu-22.04"
            #location:
            #setup_script:

#

The Windows builds don't have the self_hosted label. In fact, there is no label specified at all, making it only use windows-2019 as label, as far as I understand. This is the offending workflow: https://github.com/overte-org/overte/blob/master/.github/workflows/pr_build.yml it looks complicated, but only the windows-2019 section of the matrix include should be important.

vzakaznikov commented 2 months ago

I think the bug is in https://github.com/testflows/TestFlows-GitHub-Hetzner-Runners/blob/main/testflows/github/hetzner/runners/scale_up.py#L799 where we don't check if the runner_name is one of ours or not.

vzakaznikov commented 2 months ago

The https://github.com/testflows/TestFlows-GitHub-Hetzner-Runners/commit/efe4e49ebd892d17d56f28971c0c81c63ba6d733 should address this issue. We were trying to pull labels for a runner that is not self-hosted.

vzakaznikov commented 2 months ago

Fixed in https://pypi.org/project/testflows.github.hetzner.runners/1.7.240908.1140002/