There are several data race conditions in computation.go. These are detectable as of v1.7.6, with the existing tests using go test -race. Here is one test:
Click to expand test output
```
~/signalfx-go/signalflow$ go test -race -run TestBuffersDataMessages
==================
WARNING: DATA RACE
Write at 0x00c0000a9230 by goroutine 9:
runtime.mapassign_fast64()
/usr/local/Cellar/go@1.12/1.12.17/libexec/src/runtime/map_fast64.go:92 +0x0
github.com/signalfx/signalfx-go/signalflow.(*Computation).processMessage()
/Users/kerby.hughes/signalfx-go/signalflow/computation.go:271 +0x12e7
github.com/signalfx/signalfx-go/signalflow.(*Computation).watchMessages()
/Users/kerby.hughes/signalfx-go/signalflow/computation.go:219 +0x1a1
Previous read at 0x00c0000a9230 by goroutine 6:
runtime.mapaccess1_fast64()
/usr/local/Cellar/go@1.12/1.12.17/libexec/src/runtime/map_fast64.go:12 +0x0
github.com/signalfx/signalfx-go/signalflow.(*Computation).TSIDMetadata.func1()
/Users/kerby.hughes/signalfx-go/signalflow/computation.go:190 +0x87
github.com/signalfx/signalfx-go/signalflow.(*Computation).waitForMetadata()
/Users/kerby.hughes/signalfx-go/signalflow/computation.go:106 +0xa9
github.com/signalfx/signalfx-go/signalflow.(*Computation).TSIDMetadata()
/Users/kerby.hughes/signalfx-go/signalflow/computation.go:190 +0x72
github.com/signalfx/signalfx-go/signalflow.TestBuffersDataMessages()
/Users/kerby.hughes/signalfx-go/signalflow/computation_test.go:52 +0x44a
testing.tRunner()
/usr/local/Cellar/go@1.12/1.12.17/libexec/src/testing/testing.go:865 +0x163
Goroutine 9 (running) created at:
github.com/signalfx/signalfx-go/signalflow.newComputation()
/Users/kerby.hughes/signalfx-go/signalflow/computation.go:83 +0x3e5
github.com/signalfx/signalfx-go/signalflow.TestBuffersDataMessages()
/Users/kerby.hughes/signalfx-go/signalflow/computation_test.go:37 +0x248
testing.tRunner()
/usr/local/Cellar/go@1.12/1.12.17/libexec/src/testing/testing.go:865 +0x163
Goroutine 6 (running) created at:
testing.(*T).Run()
/usr/local/Cellar/go@1.12/1.12.17/libexec/src/testing/testing.go:916 +0x65a
testing.runTests.func1()
/usr/local/Cellar/go@1.12/1.12.17/libexec/src/testing/testing.go:1157 +0xa8
testing.tRunner()
/usr/local/Cellar/go@1.12/1.12.17/libexec/src/testing/testing.go:865 +0x163
testing.runTests()
/usr/local/Cellar/go@1.12/1.12.17/libexec/src/testing/testing.go:1155 +0x523
testing.(*M).Run()
/usr/local/Cellar/go@1.12/1.12.17/libexec/src/testing/testing.go:1072 +0x2eb
main.main()
_testmain.go:76 +0x222
==================
--- FAIL: TestBuffersDataMessages (0.00s)
testing.go:809: race detected during execution of test
FAIL
exit status 1
FAIL github.com/signalfx/signalfx-go/signalflow 0.195s
~/signalfx-go/signalflow$
```
When using this library as in the example, the computation data channel returns messages whose payloads include a value and a tsid, which is an internal identifier. In order to get the metadata for a value, the user of this library calls the TSIDMetadata function which looks up the tsid in an internal map. However, the computation may still be writing data to this internal map, and the map is not thread-safe, causing the runtime to panic. The calls to waitForMetadata which are used only use a lock on the first read, not the read in the return statement, and there are no write locks to these shared values in processMessage
The example code seems to avoid thse conditions because of the blocking print calls to comp.Resolution(), comp.MaxDelay(), and comp.Lag(). This ensures that by the time the example code reads from the internal metadata map, the computation is done writing. All of the calls to shared data on the computation are data races, however since they only read struct values and not map lookups, they do not cause the runtime to panic.
The panic can be seen with the example code by commenting out the informational metadata calls, and providing a signalflow program that immediately starts returning data:
One possible solution is to use the existing locking struct updateSignal to wrap all reads and writes to these shared values on the Computation struct. I will submit a PR with these changes.
Alternatives:
Return metadata over blocking channels, so that the caller is safe to wait for these in any order.
There are several data race conditions in
computation.go
. These are detectable as ofv1.7.6
, with the existing tests usinggo test -race
. Here is one test:Click to expand test output
``` ~/signalfx-go/signalflow$ go test -race -run TestBuffersDataMessages ================== WARNING: DATA RACE Write at 0x00c0000a9230 by goroutine 9: runtime.mapassign_fast64() /usr/local/Cellar/go@1.12/1.12.17/libexec/src/runtime/map_fast64.go:92 +0x0 github.com/signalfx/signalfx-go/signalflow.(*Computation).processMessage() /Users/kerby.hughes/signalfx-go/signalflow/computation.go:271 +0x12e7 github.com/signalfx/signalfx-go/signalflow.(*Computation).watchMessages() /Users/kerby.hughes/signalfx-go/signalflow/computation.go:219 +0x1a1 Previous read at 0x00c0000a9230 by goroutine 6: runtime.mapaccess1_fast64() /usr/local/Cellar/go@1.12/1.12.17/libexec/src/runtime/map_fast64.go:12 +0x0 github.com/signalfx/signalfx-go/signalflow.(*Computation).TSIDMetadata.func1() /Users/kerby.hughes/signalfx-go/signalflow/computation.go:190 +0x87 github.com/signalfx/signalfx-go/signalflow.(*Computation).waitForMetadata() /Users/kerby.hughes/signalfx-go/signalflow/computation.go:106 +0xa9 github.com/signalfx/signalfx-go/signalflow.(*Computation).TSIDMetadata() /Users/kerby.hughes/signalfx-go/signalflow/computation.go:190 +0x72 github.com/signalfx/signalfx-go/signalflow.TestBuffersDataMessages() /Users/kerby.hughes/signalfx-go/signalflow/computation_test.go:52 +0x44a testing.tRunner() /usr/local/Cellar/go@1.12/1.12.17/libexec/src/testing/testing.go:865 +0x163 Goroutine 9 (running) created at: github.com/signalfx/signalfx-go/signalflow.newComputation() /Users/kerby.hughes/signalfx-go/signalflow/computation.go:83 +0x3e5 github.com/signalfx/signalfx-go/signalflow.TestBuffersDataMessages() /Users/kerby.hughes/signalfx-go/signalflow/computation_test.go:37 +0x248 testing.tRunner() /usr/local/Cellar/go@1.12/1.12.17/libexec/src/testing/testing.go:865 +0x163 Goroutine 6 (running) created at: testing.(*T).Run() /usr/local/Cellar/go@1.12/1.12.17/libexec/src/testing/testing.go:916 +0x65a testing.runTests.func1() /usr/local/Cellar/go@1.12/1.12.17/libexec/src/testing/testing.go:1157 +0xa8 testing.tRunner() /usr/local/Cellar/go@1.12/1.12.17/libexec/src/testing/testing.go:865 +0x163 testing.runTests() /usr/local/Cellar/go@1.12/1.12.17/libexec/src/testing/testing.go:1155 +0x523 testing.(*M).Run() /usr/local/Cellar/go@1.12/1.12.17/libexec/src/testing/testing.go:1072 +0x2eb main.main() _testmain.go:76 +0x222 ================== --- FAIL: TestBuffersDataMessages (0.00s) testing.go:809: race detected during execution of test FAIL exit status 1 FAIL github.com/signalfx/signalfx-go/signalflow 0.195s ~/signalfx-go/signalflow$ ```When using this library as in the example, the computation data channel returns messages whose payloads include a value and a tsid, which is an internal identifier. In order to get the metadata for a value, the user of this library calls the TSIDMetadata function which looks up the tsid in an internal map. However, the computation may still be writing data to this internal map, and the map is not thread-safe, causing the runtime to panic. The calls to waitForMetadata which are used only use a lock on the first read, not the read in the return statement, and there are no write locks to these shared values in
processMessage
The example code seems to avoid thse conditions because of the blocking print calls to
comp.Resolution()
,comp.MaxDelay()
, andcomp.Lag()
. This ensures that by the time the example code reads from the internal metadata map, the computation is done writing. All of the calls to shared data on the computation are data races, however since they only read struct values and not map lookups, they do not cause the runtime to panic.The panic can be seen with the example code by commenting out the informational metadata calls, and providing a signalflow program that immediately starts returning data:
Fixes:
One possible solution is to use the existing locking struct
updateSignal
to wrap all reads and writes to these shared values on theComputation
struct. I will submit a PR with these changes.Alternatives:
Return metadata over blocking channels, so that the caller is safe to wait for these in any order.
Thanks