riverqueue / river

Fast and reliable background jobs in Go
https://riverqueue.com
Mozilla Public License 2.0
2.86k stars 68 forks source link

Flaky unique inserter test #392

Closed bgentry closed 3 weeks ago

bgentry commented 1 month ago

From this run

--- FAIL: TestUniqueInserter_JobInsert (0.00s)
    --- FAIL: TestUniqueInserter_JobInsert/UniqueJobContention (0.13s)
        db_unique_test.go:491: 
                Error Trace:    /home/runner/work/river/river/internal/dbunique/db_unique_test.go:491
                Error:          Not equal: 
                                expected: 219
                                actual  : 221
                Test:           TestUniqueInserter_JobInsert/UniqueJobContention
FAIL
FAIL    github.com/riverqueue/river/internal/dbunique   0.184s

It seems like something has gotten worse on GitHub Actions in recent weeks. Lots of new flaky tests appearing on stuff that hasn’t changed in awhile.

brandur commented 3 weeks ago

@bgentry Can you take a look at this one?

It got a little more complicated than I hoped. I believe the intermittency problem is what I described above in the pull request description, but I found when I went to push a fix that I was getting a massively failing test matrix.

Looking into it, that turned out to be because although our time stubs work to a certain extent, they're ignored completely when jobs are being inserted.

I was able to get this fixed up by adding created_at to inserts with a fall back to now(), but doing so makes the time stub system a little more elaborate. I tried a number of approaches, and this is the best one I could come up with, but it does change the time stub from a simple function to an interface that can be (1) stubbed, and (2) checked to see if it is stubbed.

It does add a little more code, but all the alternatives I can think of are worse. It's fairly important in specific cases that time be stubbable, and if job insertion always ignores a stub, then it's not really stubbable. I could possibly fix the test by inserting the job and the updating its created_at with a different query, but that seems pretty bad.

The good news is that we take three different time stub systems (one in limiter_test.go, the func approach in riverinternaltest, and the StubTime helper in riverinternaltest), and unify them into a single system.

brandur commented 3 weeks ago

Oops, posted that commented on the issue instead of the pull request. Going to delete and post again.