riverqueue / river

Fast and reliable background jobs in Go
https://riverqueue.com
Mozilla Public License 2.0
2.86k stars 68 forks source link

Allow I/O timeout errors that fall out of stress tests #417

Closed brandur closed 1 week ago

brandur commented 2 weeks ago

An intermittent error that we observe reasonably frequently is an I/O timeout that occurs while start/stopping a service hundreds of times, and an error is returned as a Postgres connection was being established:

logger.go:257: time=2024-07-03T04:52:56.081Z level=INFO msg="Notifier: Listener connecting"
logger.go:257: time=2024-07-03T04:52:56.082Z level=INFO msg="Notifier: Listener connecting"
logger.go:257: time=2024-07-03T04:52:56.082Z level=ERROR msg="Notifier: Error connecting listener" err="write failed: write tcp 127.0.0.1:60976->127.0.0.1:5432: i/o timeout"
startstoptest.go:37:
        Error Trace:    /home/runner/work/river/river/internal/riverinternaltest/startstoptest/startstoptest.go:37
                                                /opt/hostedtoolcache/go/1.22.4/x64/src/runtime/asm_amd64.s:1695
        Error:          Received unexpected error:
                        write failed: write tcp 127.0.0.1:60976->127.0.0.1:5432: i/o timeout
        Test:           TestElector_WithNotifier/StartStopStress
logger.go:257: time=2024-07-03T04:52:56.083Z level=INFO msg="Notifier: Listener connecting"

Example failing run here:

https://github.com/riverqueue/river/actions/runs/9771997514/job/26975810964?pr=416

Here, try to address the problem by special casing for an "i/o timeout" error if one comes back during a stress test. The approach is a little on the hacky side, but given these tests are a little extravagant anyway in the amount of churning they produce on purpose, I think it's probably okay.

brandur commented 2 weeks ago

@bgentry Reran the matrix 5x times ... I think this'll do the trick.

brandur commented 1 week ago

Yeah, just given these tests are looking for raciness rather than anything else, seems okay to be a little hacky like this. Thanks!