nats-io / nats.go

Golang client for NATS, the cloud native messaging system.
https://nats.io
Apache License 2.0
5.55k stars 696 forks source link

Data race and negative `WaitGroup` counter in `OrderedConsumer` tests #1523

Closed mdawar closed 9 months ago

mdawar commented 10 months ago

Observed behavior

While checking a bug in my code and running the tests, I found that the ordered consumer tests are flaky.

The tests are passing with a simple test run, but when I run the tests repeatedly using the -count flag I get data race errors and negative WaitGroup counter errors.

Test output ```sh $ go test -modfile=go_test.mod -race -p=1 ./... --failfast -vet=off -run OrderedConsumer -v -count 10 testing: warning: no tests to run PASS ok github.com/nats-io/nats.go 1.007s [no tests to run] testing: warning: no tests to run PASS ok github.com/nats-io/nats.go/bench 1.007s [no tests to run] ? github.com/nats-io/nats.go/encoders/builtin [no test files] ? github.com/nats-io/nats.go/encoders/protobuf [no test files] ? github.com/nats-io/nats.go/examples/jetstream/js-consume [no test files] ? github.com/nats-io/nats.go/examples/jetstream/js-fetch [no test files] ? github.com/nats-io/nats.go/examples/jetstream/js-messages [no test files] ? github.com/nats-io/nats.go/examples/jetstream/js-next [no test files] ? github.com/nats-io/nats.go/examples/jetstream/js-ordered-consume [no test files] ? github.com/nats-io/nats.go/examples/jetstream/js-ordered-fetch [no test files] ? github.com/nats-io/nats.go/examples/jetstream/js-ordered-messages [no test files] ? github.com/nats-io/nats.go/examples/jetstream/js-parallel-consume [no test files] ? github.com/nats-io/nats.go/examples/nats-bench [no test files] ? github.com/nats-io/nats.go/examples/nats-echo [no test files] ? github.com/nats-io/nats.go/examples/nats-pub [no test files] ? github.com/nats-io/nats.go/examples/nats-qsub [no test files] ? github.com/nats-io/nats.go/examples/nats-req [no test files] ? github.com/nats-io/nats.go/examples/nats-rply [no test files] ? github.com/nats-io/nats.go/examples/nats-sub [no test files] testing: warning: no tests to run PASS ok github.com/nats-io/nats.go/internal/parser 1.006s [no tests to run] testing: warning: no tests to run PASS ok github.com/nats-io/nats.go/jetstream 1.007s [no tests to run] === RUN TestOrderedConsumerConsume === RUN TestOrderedConsumerConsume/base_usage,_delete_consumer ================== WARNING: DATA RACE Write at 0x00c0000d6568 by goroutine 8: github.com/nats-io/nats%2ego.(*Conn).close() /home/pierre/files/github/nats.go/worktree/ordered-consumer/nats.go:5057 +0xb5 github.com/nats-io/nats%2ego.(*Conn).Close() /home/pierre/files/github/nats.go/worktree/ordered-consumer/nats.go:5149 +0x64 github.com/nats-io/nats.go/jetstream/test.TestOrderedConsumerConsume.func2.3() /home/pierre/files/github/nats.go/worktree/ordered-consumer/jetstream/test/ordered_test.go:50 +0x33 runtime.gopanic() /usr/local/go/src/runtime/panic.go:920 +0x26f github.com/nats-io/nats.go/jetstream/test.TestOrderedConsumerConsume.func2() /home/pierre/files/github/nats.go/worktree/ordered-consumer/jetstream/test/ordered_test.go:75 +0x7fa testing.tRunner() /usr/local/go/src/testing/testing.go:1595 +0x261 testing.(*T).Run.func1() /usr/local/go/src/testing/testing.go:1648 +0x44 Previous read at 0x00c0000d6568 by goroutine 27: github.com/nats-io/nats%2ego.(*Conn).isConnected() /home/pierre/files/github/nats.go/worktree/ordered-consumer/nats.go:5367 +0x207 github.com/nats-io/nats%2ego.(*Conn).flusher() /home/pierre/files/github/nats.go/worktree/ordered-consumer/nats.go:3354 +0x286 github.com/nats-io/nats%2ego.(*Conn).processConnectInit.func3() /home/pierre/files/github/nats.go/worktree/ordered-consumer/nats.go:2298 +0x33 Goroutine 8 (running) created at: testing.(*T).Run() /usr/local/go/src/testing/testing.go:1648 +0x845 github.com/nats-io/nats.go/jetstream/test.TestOrderedConsumerConsume() /home/pierre/files/github/nats.go/worktree/ordered-consumer/jetstream/test/ordered_test.go:38 +0x2c4 testing.tRunner() /usr/local/go/src/testing/testing.go:1595 +0x261 testing.(*T).Run.func1() /usr/local/go/src/testing/testing.go:1648 +0x44 Goroutine 27 (running) created at: github.com/nats-io/nats%2ego.(*Conn).processConnectInit() /home/pierre/files/github/nats.go/worktree/ordered-consumer/nats.go:2298 +0x5a4 github.com/nats-io/nats%2ego.(*Conn).connect() /home/pierre/files/github/nats.go/worktree/ordered-consumer/nats.go:2327 +0x4b1 github.com/nats-io/nats%2ego.Options.Connect() /home/pierre/files/github/nats.go/worktree/ordered-consumer/nats.go:1524 +0x82d github.com/nats-io/nats%2ego.Connect() /home/pierre/files/github/nats.go/worktree/ordered-consumer/nats.go:828 +0x2d7 github.com/nats-io/nats.go/jetstream/test.TestOrderedConsumerConsume.func2() /home/pierre/files/github/nats.go/worktree/ordered-consumer/jetstream/test/ordered_test.go:41 +0x104 testing.tRunner() /usr/local/go/src/testing/testing.go:1595 +0x261 testing.(*T).Run.func1() /usr/local/go/src/testing/testing.go:1648 +0x44 ================== panic: sync: negative WaitGroup counter goroutine 66 [running]: sync.(*WaitGroup).Add(0xc0004a2920, 0xffffffffffffffff) /usr/local/go/src/sync/waitgroup.go:62 +0x1cd sync.(*WaitGroup).Done(0xc0000921b7?) /usr/local/go/src/sync/waitgroup.go:87 +0x2e github.com/nats-io/nats.go/jetstream/test.TestOrderedConsumerConsume.func2.1({0x1323918?, 0xc00035e0e0}) /home/pierre/files/github/nats.go/worktree/ordered-consumer/jetstream/test/ordered_test.go:68 +0x145 github.com/nats-io/nats.go/jetstream.(*orderedConsumer).Consume.(*orderedConsumer).Consume.func1.func6({0x1323918, 0xc00035e0e0}) /home/pierre/files/github/nats.go/worktree/ordered-consumer/jetstream/ordered.go:129 +0x25d github.com/nats-io/nats.go/jetstream.(*pullConsumer).Consume.func1(0xc0003682a0) /home/pierre/files/github/nats.go/worktree/ordered-consumer/jetstream/pull.go:247 +0x43a github.com/nats-io/nats%2ego.(*Conn).waitForMsgs(0xc0000d6000, 0xc0004a6410) /home/pierre/files/github/nats.go/worktree/ordered-consumer/nats.go:3045 +0x77f created by github.com/nats-io/nats%2ego.(*Conn).subscribeLocked in goroutine 20 /home/pierre/files/github/nats.go/worktree/ordered-consumer/nats.go:4256 +0x7d0 FAIL github.com/nats-io/nats.go/jetstream/test 0.063s testing: warning: no tests to run PASS ok github.com/nats-io/nats.go/micro 1.007s [no tests to run] testing: warning: no tests to run PASS ok github.com/nats-io/nats.go/micro/test 1.009s [no tests to run] === RUN TestJetStreamOrderedConsumerRecreateAfterReconnect js_test.go:9402: Invalid msg value; want: 'msg 1'; got: "msg" --- FAIL: TestJetStreamOrderedConsumerRecreateAfterReconnect (0.04s) FAIL FAIL github.com/nats-io/nats.go/test 0.048s ? github.com/nats-io/nats.go/util [no test files] FAIL ```
Another test output ```sh WARNING: DATA RACE Write at 0x00c0000b8868 by goroutine 25407: github.com/nats-io/nats.go/jetstream.(*orderedConsumer).getConsumerConfigForSeq() /home/pierre/files/github/nats.go/jetstream/ordered.go:491 +0x84 github.com/nats-io/nats.go/jetstream.(*orderedConsumer).reset() /home/pierre/files/github/nats.go/jetstream/ordered.go:461 +0x5c4 github.com/nats-io/nats.go/jetstream.(*orderedConsumer).Consume.func2() /home/pierre/files/github/nats.go/jetstream/ordered.go:152 +0x270 Previous read at 0x00c0000b8868 by goroutine 25404: github.com/nats-io/nats.go/jetstream.(*orderedConsumer).Consume.(*orderedConsumer).errHandler.func3() /home/pierre/files/github/nats.go/jetstream/ordered.go:201 +0x1e4 github.com/nats-io/nats.go/jetstream.(*pullSubscription).handleStatusMsg() /home/pierre/files/github/nats.go/jetstream/pull.go:661 +0x157 github.com/nats-io/nats.go/jetstream.(*pullConsumer).Consume.func1() /home/pierre/files/github/nats.go/jetstream/pull.go:233 +0x1c6 github.com/nats-io/nats%2ego.(*Conn).waitForMsgs() /home/pierre/files/github/nats.go/nats.go:3045 +0x77e github.com/nats-io/nats%2ego.(*Conn).subscribeLocked.func1() /home/pierre/files/github/nats.go/nats.go:4256 +0x44 Goroutine 25407 (running) created at: github.com/nats-io/nats.go/jetstream.(*orderedConsumer).Consume() /home/pierre/files/github/nats.go/jetstream/ordered.go:138 +0xd75 github.com/nats-io/nats.go/jetstream/test.TestOrderedConsumerConsume.func8() /home/pierre/files/github/nats.go/jetstream/test/ordered_test.go:375 +0x964 testing.tRunner() /usr/local/go/src/testing/testing.go:1595 +0x261 testing.(*T).Run.func1() /usr/local/go/src/testing/testing.go:1648 +0x44 Goroutine 25404 (finished) created at: github.com/nats-io/nats%2ego.(*Conn).subscribeLocked() /home/pierre/files/github/nats.go/nats.go:4256 +0x7cf github.com/nats-io/nats%2ego.(*Conn).subscribe() /home/pierre/files/github/nats.go/nats.go:4192 +0x152 github.com/nats-io/nats%2ego.(*Conn).Subscribe() /home/pierre/files/github/nats.go/nats.go:4108 +0xb48 github.com/nats-io/nats.go/jetstream.(*pullConsumer).Consume() /home/pierre/files/github/nats.go/jetstream/pull.go:258 +0xad2 github.com/nats-io/nats.go/jetstream.(*orderedConsumer).Consume() /home/pierre/files/github/nats.go/jetstream/ordered.go:133 +0xc10 github.com/nats-io/nats.go/jetstream/test.TestOrderedConsumerConsume.func8() /home/pierre/files/github/nats.go/jetstream/test/ordered_test.go:375 +0x964 testing.tRunner() /usr/local/go/src/testing/testing.go:1595 +0x261 testing.(*T).Run.func1() /usr/local/go/src/testing/testing.go:1648 +0x44 ================== panic: sync: negative WaitGroup counter goroutine 25462 [running]: sync.(*WaitGroup).Add(0xc002e0b4f0, 0xffffffffffffffff) /usr/local/go/src/sync/waitgroup.go:62 +0x1cd sync.(*WaitGroup).Done(0xc002722397?) /usr/local/go/src/sync/waitgroup.go:87 +0x2e github.com/nats-io/nats.go/jetstream/test.TestOrderedConsumerConsume.func8.1({0x1323918?, 0xc000090700}) /home/pierre/files/github/nats.go/jetstream/test/ordered_test.go:378 +0x154 github.com/nats-io/nats.go/jetstream.(*orderedConsumer).Consume.func1.1({0x1323918, 0xc000090700}) /home/pierre/files/github/nats.go/jetstream/ordered.go:129 +0x315 github.com/nats-io/nats.go/jetstream.(*pullConsumer).Consume.func1(0xc00151cee0) /home/pierre/files/github/nats.go/jetstream/pull.go:247 +0x43a github.com/nats-io/nats%2ego.(*Conn).waitForMsgs(0xc003147100, 0xc00388ad00) /home/pierre/files/github/nats.go/nats.go:3045 +0x77f created by github.com/nats-io/nats%2ego.(*Conn).subscribeLocked in goroutine 25491 /home/pierre/files/github/nats.go/nats.go:4256 +0x7d0 FAIL github.com/nats-io/nats.go/jetstream/test 94.295s testing: warning: no tests to run PASS ok github.com/nats-io/nats.go/micro 1.015s [no tests to run] testing: warning: no tests to run PASS ok github.com/nats-io/nats.go/micro/test 1.018s [no tests to run] === RUN TestJetStreamOrderedConsumerRecreateAfterReconnect js_test.go:9402: Invalid msg value; want: 'msg 1'; got: "msg" --- FAIL: TestJetStreamOrderedConsumerRecreateAfterReconnect (0.04s) FAIL FAIL github.com/nats-io/nats.go/test 0.069s ? github.com/nats-io/nats.go/util [no test files] FAIL ```

Expected behavior

The tests are expected to pass.

Server and client version

nats-server: v2.10.9 nats.go: v1.32.0

Host environment

OS: Linux (Debian 11.8 and Fedora 39) Go version: go1.21.6 linux/amd64

Steps to reproduce

go test -modfile=go_test.mod -race -p=1 ./... --failfast -vet=off -run OrderedConsumer -v -count 10
piotrpio commented 10 months ago

Thank you for creating the issue, we'll look at this and #1524 and probably do a patch release with fixes.