Closed torbenschinke closed 2 months ago
How can I reproduce this?
I was able to reproduce it, using a minimal example and it seems to happen when closing the connection.
func main() {
client, err := knx.NewGroupTunnel("10.6.6.140:3671", knx.DefaultTunnelConfig)
if err != nil {
panic(err)
}
go func() {
for msg := range client.Inbound() {
fmt.Println(msg.Command.String(), msg.Destination.String())
}
}()
time.Sleep(20 * time.Second)
client.Close()
}
Run it with enabled race detector:
go run -race testmod/knx/cmd
I can't reproduce this. I also struggle to understand what exactly the data race is in this case. It seems like it's a write paired with a read, that seems perfectly fine - especially since the read is allowed to fail.
I don't think I can help you without a lot more information and/or a MCVE.
Did you really tried to reproduce it with the race detector enabled (the -race
flag)? A race condition requires at least one groroutine who writes and another who reads. Without any kind of synchronization this will always result in a data race, so that is by definition never fine.
As far as I can see now is that this always happens if one go-routine writes to a channel and another go-routine closes it. This is exactly what the race detector traces show: in handleConnStateRes
a goroutine is spawned to write into the heartbeat channel, however that channel is closed from a different goroutine by process
.
See also this SO discussion.
What you're describing is not a race condition to me unless closing a channel is not atomic.
Also see this comment, we don't care about writing to closed channels: https://github.com/vapourismo/knx-go/blob/3b325e3f5dcf040dbe30c3ae657e004267b96ce6/knx/tunnel.go#L448
Sorry, I don't want to be rude, but it is obviously a race condition. Its existence has been proofed alone by the fact, that the Go race detector has found it. There are no known false-positive cases. We now have the task to understand and explain the situation. Your recover logic and comment is just a sympton of a wrong channel usage. The issue is not about sending to a closed channel. The issue is, that you are using the channels in way, that they can race internally. What you are essentially doing is the following:
func main() {
heartbeat := make(chan bool)
defer close(heartbeat)
defer func() {
if r := recover(); r != nil {
fmt.Println("Recovered in f", r)
go func() {
heartbeat <- true
}()
}
}()
for {
select {
case <-heartbeat:
default:
panic("panic")
}
}
}
When you run with the race detector enabled, you see the equal stack trace of the racing read- and writes in the channels code in the runtime itself:
Recovered in f panic
==================
WARNING: DATA RACE
Read at 0x00c00007e0d0 by goroutine 7:
runtime.chansend1()
/opt/homebrew/opt/go/libexec/src/runtime/chan.go:146 +0x2c
....
Previous write at 0x00c00007e0d0 by main goroutine:
runtime.recvDirect()
/opt/homebrew/opt/go/libexec/src/runtime/chan.go:348 +0x7c
....
I can just recommend that you take the time, to refresh your knowledge about race conditions and how the Go race detector works (see https://go.dev/blog/race-detector). It has no false-positive findings. Every data race it finds, is a real race. However, it also tells us that it cannot not find each race, but that is not the point here. I cannot tell, if this channel behavior is intentional by the Go language designers or not. But we can be sure, that this behavior is technically wrong, proofed by the observations of the race detector.
Very good readings specific about the Go Memory model has been created by one of the core developers: https://research.swtch.com/mm.
Regarding
unless closing a channel is not atomic.
It is documented at the end of the article https://go.dev/doc/articles/race_detector that this is indeed not "atomic" in your understanding:
Unsynchronized send and close operations ... As this example demonstrates, unsynchronized send and close operations on the same channel can also be a race condition ...
Note also similar discussions in https://github.com/golang/go/issues/27769 or https://github.com/golang/go/issues/30372
During development while having the go race detector turned on, I noticed the following data race warning:
I'm not sure, if I'm doing something unexpected, but I don't believe so.