stretchr / testify

A toolkit with common assertions and mocks that plays nicely with the standard library
MIT License
23.37k stars 1.59k forks source link

Eventually can fail having never run condition #1652

Open brackendawson opened 3 weeks ago

brackendawson commented 3 weeks ago

Description

Witnessed a test flake on one of the go versions (1.21) in CI from the master branch because the assert package took too long: https://github.com/stretchr/testify/actions/runs/11182446108/job/31088748678

Step To Reproduce

Run CI.

Expected behavior

CI passes.

Actual behavior

Tests timed out.

brackendawson commented 3 weeks ago

It took significantly longer than usual (10 minutes vs 45 seconds) which suggests that TestEventuallyTimeout can lock:

=== RUN   TestNeverFalse
--- PASS: TestNeverFalse (0.10s)
=== RUN   TestNeverTrue
--- PASS: TestNeverTrue (0.04s)
=== RUN   TestEventuallyTimeout
panic: test timed out after 10m0s
running tests:
    TestEventuallyTimeout (9m59s)

goroutine 1107 [running]:
testing.(*M).startAlarm.func1()
    /opt/hostedtoolcache/go/1.21.13/x64/src/testing/testing.go:2259 +0x259
created by time.goFunc
    /opt/hostedtoolcache/go/1.21.13/x64/src/time/sleep.go:176 +0x45

goroutine 1 [chan receive, 9 minutes]:
testing.(*T).Run(0xc00015c340, {0x7d9bfc, 0x15}, 0x8056c0)
    /opt/hostedtoolcache/go/1.21.13/x64/src/testing/testing.go:1649 +0x871
testing.runTests.func1(0x0?)
    /opt/hostedtoolcache/go/1.21.13/x64/src/testing/testing.go:2054 +0x85
testing.tRunner(0xc00015c340, 0xc00014daf8)
    /opt/hostedtoolcache/go/1.21.13/x64/src/testing/testing.go:1595 +0x262
testing.runTests(0xc00007ebe0?, {0xa1c440, 0xb9, 0xb9}, {0x1c?, 0x1e?, 0xa1ea00?})
    /opt/hostedtoolcache/go/1.21.13/x64/src/testing/testing.go:2052 +0x8ae
testing.(*M).Run(0xc00007ebe0)
    /opt/hostedtoolcache/go/1.21.13/x64/src/testing/testing.go:1925 +0xcd8
main.main()
    _testmain.go:419 +0x2be

goroutine 320 [chan receive, 9 minutes]:
testing.(*T).Parallel(0xc000503ba0)
    /opt/hostedtoolcache/go/1.21.13/x64/src/testing/testing.go:1403 +0x590
github.com/stretchr/testify/assert.TestDiffRace(0x0?)
    /home/runner/work/testify/testify/assert/assertions_test.go:2538 +0x3c
testing.tRunner(0xc000503ba0, 0x805620)
    /opt/hostedtoolcache/go/1.21.13/x64/src/testing/testing.go:1595 +0x262
created by testing.(*T).Run in goroutine 1
    /opt/hostedtoolcache/go/1.21.13/x64/src/testing/testing.go:1648 +0x846

goroutine 1106 [chan receive, 9 minutes]:
github.com/stretchr/testify/assert.TestEventuallyTimeout.func1()
    /home/runner/work/testify/testify/assert/assertions_test.go:3029 +0x155
github.com/stretchr/testify/assert.didPanic(0xc0002ae798)
    /home/runner/work/testify/testify/assert/assertions.go:1240 +0xdd
github.com/stretchr/testify/assert.NotPanics({0x85db20, 0xc00027e820}, 0xc0002ae798, {0x0, 0x0, 0x0})
    /home/runner/work/testify/testify/assert/assertions.go:1311 +0x8f
github.com/stretchr/testify/assert.TestEventuallyTimeout(0xc00027e820)
    /home/runner/work/testify/testify/assert/assertions_test.go:3015 +0xf6
testing.tRunner(0xc00027e820, 0x8056c0)
    /opt/hostedtoolcache/go/1.21.13/x64/src/testing/testing.go:1595 +0x262
created by testing.(*T).Run in goroutine 1
    /opt/hostedtoolcache/go/1.21.13/x64/src/testing/testing.go:1648 +0x846
FAIL    github.com/stretchr/testify/assert  600.024s
brackendawson commented 3 weeks ago

Related to #805 but not a regression or duplicate.

I think this is a classic race caused by allocations slowing down the sequential starting of timers.

Eventually creates a timer which starts immediately for the waitFor arg. It then creates a ticker which starts immediately for the tick arg. Then there is a select statement including both the timer and ticker. If the ticker fires we run the condition function, and if the timer fires we fail the test.

The unit test TestEventuallyTimeout test uses waitFor=time.Millisecond and tick=time.Microsecond. If more than a millisecond elapsed between the timer starting and the ticker starting (allocations are required in this gap), then it is possible for Eventually to fail the test having never run the condition function.

I propose reversing the order the timer and ticker are created, though this does not resolve the issue. And running condition once before the select statement.