Open n2ygk opened 9 months ago
You are correct in how you describe the behavior. We are probably also throttled a bit since we have such an intense job run. A runner is allocated for every build in the matrix. Maybe explicitly selecting a different runner class for the success job would get it allocated more quickly.
Yeah presumably these runners are all counted against the Jazzband org. Can we try this without having to bug @jezdez?
The reason we added the success job to the build process so we wouldn't need @jezdez to intercede to change the success criteria of our builds since we don't have settings access. We should be able to select the machine class by changing runs-on for the success job. Maybe we can get away without specifying it? I'm not sure what the default is...
I think this is something we could maybe open with Github support?
I assume we're waiting on the backlog of jazzband jobs and it's being slowed down by the concurrent job limit, https://docs.github.com/en/actions/learn-github-actions/usage-limits-billing-and-administration
Another option may be to go ahead and reduce our matrix dropping django 4.0 and django 4.1 since they're no longer supported upstream. That should reduce our matrix by 10 jobs. Success still won't be enqueued until they're complete...
alternatively if @jezdez would give you, @n2ygk, or someone else on team settings access to this repo, then we could manage the branch protections ourselves and wouldn't need the success job since we could update the required checks when needed.
@jezdez @n2ygk I fired off a request to GH support to increase the concurrent build limit for the jazzband organization.
Describe the bug
In watching multiple PRs after I've approved them, it appears to take a long time for the
success
job to start after the last step of thebuild
job has finished. See #1219 where the separate success job was added to make it easier to update the matrix and only ever depend on build to finish for tests to succeed.To Reproduce
Cause a PR to run tests.
Expected behavior
I didn't expect anything but was hoping that the wait for the success step wouldn't happen.
Version
current master branch
Additional context
@dopry I'm guessing that GH is allocating a runner(s) for each job, so after the build job finishes, we wait for another runner to become available for the success job. This takes a while. See below with timestamps selected. So I am guessing that running a second job that depends on the first has to wait for a new runner to become available. Sometimes correlation is indicative of causation.
Mon, 18 Dec 2023 17:59:18 GMT
last matrix step of build job finishedMon, 18 Dec 2023 18:31:45 GMT
success job startsWhile watching the PR, the success job status is waiting on a runner. Here's some raw log showing the 30 minute wait for a runner: