redhat-actions / openshift-tools-installer

Download, install and cache OpenShift binaries into your GitHub Actions runners.
https://github.com/marketplace/actions/openshift-tools-installer
MIT License
23 stars 24 forks source link

[BUG] Task executes but step seems to hang for 3 minutes before it completes #105

Closed komish closed 3 months ago

komish commented 4 months ago

Version

Download action repository 'redhat-actions/openshift-tools-installer@v1' (SHA:2de9a80cf012ad0601021515481d433b91ef8fd5)

Describe the bug

Around 2024 July 04, I started seeing a consistent 3 minute runtime for openshift-tools-installer. Two separate calls, installing separate tools. We were previously seeing this take anywhere from 1-15 seconds, but now we're consistently seeing it take 3 minutes.

From the logs, I can see that all of the actual "work" being executed by the task completes almost immediately (within the 1-15 seconds), but the actions step just ... hangs until the 3m timer completes. Then it halts (green check, successful).

This is causing issues with some of our e2e workloads causing them to hit GitHub Actions timeouts.

Steps to reproduce, workflow links, screenshots

Here's a test repo demonstrating the problem. https://github.com/komish/actions-workflow-call-test/actions/runs/9847203226

It installs oc and chart-verifier. In both cases, it's skipping the cache. oc comes the mirror, and chart-verifier comes from github.

Here's the same happening against all active Ubuntu images available in GHA. Ignore the failed one (running on 20.04). It's unrelated.

https://github.com/komish/actions-workflow-call-test/actions/runs/9847235069

Here's the last time we saw a short runtime. Both under 15s (respectively).

https://github.com/openshift-helm-charts/sandbox/actions/runs/9788116992/job/27025670858#step:14:1 (oc) https://github.com/openshift-helm-charts/sandbox/actions/runs/9788116992/job/27025670858#step:12:1 (chart-verifier)

komish commented 4 months ago

cc @divyansh42 let me know what information I can provide. I don't think I see any changes in this repo in that time frame, but I'm having trouble debugging further.

I see some kind of image update for ubuntu22.04 (which is what I use) in https://github.com/actions/runner-images/releases/tag/ubuntu22%2F20240630.1 - but I also tested against other ubuntu images to observe the same result. Their release time frame is close, but not exact - they released on 2024-07-02 and we saw a successful run on 2024-07-03. Could be close enough to attribute this issue to the image. Hard to tell.

The only remaining thing I can think may be in scope (other than this repo) is the action code itself https://github.com/actions/runner/releases/tag/v2.317.0 - but it hasn't release in this time frame either.

komish commented 4 months ago

Opened an issue against actions/runner-images to see if their recent releases have some kind of issue causing this: https://github.com/actions/runner-images/issues/10211

komish commented 4 months ago

Folks, I think the underlying problem is here: https://github.blog/changelog/2024-03-07-github-actions-all-actions-will-run-on-node20-instead-of-node16-by-default/

I don't see this problem on Node16, just Node20. GitHub Actions are replacing Node16 with Node20 by default now.

I tried to debug, and could consistently reproduce this with Node20 on my workstation running your dev-test script. I applied a couple of workarounds (e.g. adding in a .finally block that exits the process), but I didn't really feel like that's the right way to go about this.

I'm guessing there's some promise not being resolved properly somewhere.