nats-io / nats.go

Golang client for NATS, the cloud native messaging system.
https://nats.io
Apache License 2.0
5.55k stars 696 forks source link

Data Race in Jetstream Consume #1470

Closed princjef closed 10 months ago

princjef commented 12 months ago

Observed behavior

When upgrading from nats.go 1.30.2 to 1.31.0, we are starting to see consistent races detected in tests when calling the Consume() API on a jetstream.Consumer. The race stack is included below:

Write at 0x00c000a81110 by goroutine 102:
  runtime.mapdelete_faststr()
      /opt/hostedtoolcache/go/1.20.3/x64/src/runtime/map_faststr.go:301 +0x0
  github.com/nats-io/nats.go/jetstream.(*pullSubscription).cleanup()
      /home/cloudtest/go/pkg/mod/github.com/nats-io/nats.go@v1.31.0/jetstream/pull.go:846 +0x228
  github.com/nats-io/nats.go/jetstream.(*pullSubscription).pullMessages()
      /home/cloudtest/go/pkg/mod/github.com/nats-io/nats.go@v1.31.0/jetstream/pull.go:810 +0x224
  github.com/nats-io/nats.go/jetstream.(*pullConsumer).Consume.func3()
      /home/cloudtest/go/pkg/mod/github.com/nats-io/nats.go@v1.31.0/jetstream/pull.go:359 +0x58

Previous read at 0x00c000a81110 by goroutine 103:
  runtime.mapaccess1_faststr()
      /opt/hostedtoolcache/go/1.20.3/x64/src/runtime/map_faststr.go:13 +0x0
  github.com/nats-io/nats.go/jetstream.(*orderedConsumer).Consume.func2()
      /home/cloudtest/go/pkg/mod/github.com/nats-io/nats.go@v1.31.0/jetstream/ordered.go:144 +0x304

Goroutine 102 (running) created at:
  github.com/nats-io/nats.go/jetstream.(*pullConsumer).Consume()
      /home/cloudtest/go/pkg/mod/github.com/nats-io/nats.go@v1.31.0/jetstream/pull.go:359 +0x10ea
  github.com/nats-io/nats.go/jetstream.(*orderedConsumer).Consume()
      /home/cloudtest/go/pkg/mod/github.com/nats-io/nats.go@v1.31.0/jetstream/ordered.go:124 +0xa70
  <product code>

Goroutine 103 (running) created at:
  github.com/nats-io/nats.go/jetstream.(*orderedConsumer).Consume()
      /home/cloudtest/go/pkg/mod/github.com/nats-io/nats.go@v1.31.0/jetstream/ordered.go:129 +0xbe5
  <product code>

The race appears to be in the subscriptions map. The ordered consumer makes a side call to the underlying pull consumer here and then starts a goroutine that accesses the subscriptions map on that consumer here. However, the underlying pull consumer spins off a background goroutine here that does a cleanup internally, which deletes an entry in the subscriptions map here. The map should probably have a mutex if it is going to be used in this way.

Expected behavior

No data races are observed

Server and client version

nats.go 1.31.0

Host environment

Steps to reproduce

  1. Enable golang race detector
  2. Create a new JetStream with New
  3. Call Stream() on that JetStream
  4. Call OrderedConsumer() on that Stream
  5. Call Consume on that ordered consumer
Jarema commented 12 months ago

Hey. I'm taking a look into this. Thanks for the report.

piotrpio commented 10 months ago

Fixed in https://github.com/nats-io/nats.go/pull/1497 and available in latest release (nats.go@v1.32.0)