riverqueue / river

Fast and reliable background jobs in Go
https://riverqueue.com
Mozilla Public License 2.0
2.86k stars 68 forks source link

Fix two badly intermittent tests for periodic job enqueuer #416

Closed brandur closed 1 week ago

brandur commented 2 weeks ago

We're getting some really bad intermittency from a pair of tests in the periodic job enqueuer [1] [2] that need fixing ASAP since they're failing at least one matrix entry in almost every build.

Looking at the tests, they add a pair of jobs after starting the client, each of which operates on a 500 ms interval, and one with RunOnStart. The test case waits for the initial RunOnStart job to appear, then waits again for the next job that wasn't RunOnStart.

I think what's happening is that since 500ms is fairly long for a computer, it's not crazy long, so it's actually possible for the tests in CI to be operating slowly enough that by the time the enqueuer gets around to its first run, it actually has been 500ms already, and therefore both jobs have been enqueued, even the non-RunOnStart one.

Another thing that I realize looking closer at these test cases is that because of the way they're structured, they're slower than hell. Each one requires a wait of 500 ms because it's waiting for the non-RunOnStart job, so running them at high -count iterations is a total non-starter.

Here, I'm suggesting that we simplify things by:

The fact that the RunOnStart job has been confirmed is a pretty good chance that both jobs were indeed added properly, so I don't think we have to wait for the other one too. This resolves the intermittency problem, but also makes each test case probably on the order of 10,000 times faster.

Fixes #409.

[1] https://github.com/riverqueue/river/actions/runs/9770889034/job/26972789182?pr=413 [2] https://github.com/riverqueue/river/actions/runs/9770889034/job/26972789334?pr=413

brandur commented 2 weeks ago

@bgentry Ran the test matrix 4x times and although I got a different intermittent error (will try to fix that separately), this pull seems to fix its target.

brandur commented 1 week ago

+1. Thanks!