Closed sruehl closed 7 months ago
Effect: At the moment when you redirect the logger to t.Testing Log you get Dataraces when run with -race
IMHO this could be a problem of testing.T.Log, not of the client
Rational: When calling stop you really want the client to stop. Stopped means that no more go routine is running
Stopped
(there is no state like this), but ClientClosed
, which is defined as:
// ClientClosed
//
// The Client is not able to send and receive messages anymore and has to be started again to be able to.
Did you observe that this was not the case after calling Stop?
IMHO this could be a problem of testing.T.Log, not of the client
Yes it is a problem of T
but it is also a symptom of go routines still running.
- Did you observe that the client didn't stop?
Yes by having goroutines still logging
Did you observe that this was not the case after calling Stop?
Yes because the client just sets the state to ClientClosed
when calling Stop
.
Therefore I created #188 to fix the problem. In the original version the test works but with the fix attempt the test fails there (So test itself is good to be used upstream).
Stop()
can't wait for all goroutines ended, because the goroutines involved in the client are not build that way. They all run on a select
with a case <-ctx.Done():
if they are idle (what they will become inevitably).
So they will end after the runtime has checked all the select
statements. But this does not happen in zero-time, especially when the goroutine where Stop()
was called still runs.
It looks like #188 does just delay the return of the function. It ends when the WaitForState returns. But the WaitForState mechanism involves goroutines and channels, so the necessary context switches can take place and the goroutines are ending.
I don't see any benefit in making Stop()
synchronous. If it is called, they client restrains to execute any user requests and all resources used by the it are freed after a time. This is all you can expect in a garbage collecting runtime like go.
It would be possible to add special sync.WaitGroup to all started goroutines to make it safely synchronous, but as I said, I don't see the benefit over a simple <-time.After(50*time.Microsecond)
after calling Stop()
Stop()
can't wait for all goroutines ended, because the goroutines involved in the client are not build that way. They all run on aselect
with acase <-ctx.Done():
if they are idle (what they will become inevitably). So they will end after the runtime has checked all theselect
statements. But this does not happen in zero-time, especially when the goroutine whereStop()
was called still runs.
yes and it would be good if we have a sync.WaitGroup like you describe below.
It looks like #188 does just delay the return of the function. It ends when the WaitForState returns. But the WaitForState mechanism involves goroutines and channels, so the necessary context switches can take place and the goroutines are ending.
I thought so, hence the PR is still a WIP as we need indeed a sync.WaitGroup to check if all spawned goroutines are gone.
I don't see any benefit in making
Stop()
synchronous. If it is called, they client restrains to execute any user requests and all resources used by the it are freed after a time. This is all you can expect in a garbage collecting runtime like go.
One benefit would be that there are no go routines running anymore after you call Stop()
. In certain usecases this can leak Goroutines and the simpler case is that *testing.T
dies when something writes to t.Log
after the test is done. This makes sense and is the case when you redirect logs like I wrote above.
It would be possible to add special sync.WaitGroup to all started goroutines to make it safely synchronous, but as I said, I don't see the benefit over a simple
<-time.After(50*time.Microsecond)
after callingStop()
I listed some benefits above (no leaking goroutines, *testing.T
log redirect works). I consider it bad to work with sleeps or some kind as especially under different machines and load scenarios it is not deterministic if it is > or < 50ms for example. While it is doable to just do a sleep 5 seconds after calling Stop()
or cancel the context I consider it not a good way to check if something has properly stopped.
One benefit would be that there are no go routines running anymore after you call Stop(). In certain usecases this can leak Goroutines
There are no leaking goroutines. The all come to an end when the go runtime scheduler checks them the next time, as they all on a select statement when no signalr protocol work has to be done. But the order the scheduler does this and so the time they when all are ended can not be determined exactly.
What remains is the panic or race in testing. If you look at the code in testing.go, you will see that both results are in there to help detect not appropriately synchronized goroutines. But for signalr this is a false positive.
If you like, you can try to implement the sync.WorkGroup
idea. But keep in mind that client and server share some code.
I will add the sync.WorkGroup
the upcoming days
I updated #188 and added WaitGroup additions rarely where it made direct sense. On other go routines that were spawned as a direct result of a client action (Client should not really call those after Stop
) I left them out. There were some places where I was unsure. I added a test to check if at least we don't get some log messages (might be incomplete as it doesn't test the full lifecycle)
Looking at the
Stop
method inclient.go:220
I was wondering if the logic is wrong. At the moment the state is being set immediately:which should rather be
Rational: When calling stop you really want the client to stop. Stopped means that no more go routine is running
Effect: At the moment when you redirect the logger to
t.Testing
Log
you get Dataraces when run with-race
I created a PR for that to show the issue with a proper test case https://github.com/philippseith/signalr/pull/188. (Even with the updated code I run into a timeout)