philippseith / signalr

SignalR server and client in go
MIT License
131 stars 39 forks source link

Increase of goroutines after reconnect #151

Closed x0y0z0tn closed 1 year ago

x0y0z0tn commented 1 year ago

Hi, I am using WithConnector to reconnect, after reconnection, I have seen how the goroutines increase.

I used with dlv to identify which goroutines increased and I have seen how the goroutine websocket.(*Conn).timeoutLoop increased in every reconnection.

...
  Goroutine 49 - User: /Users/user/.asdf/installs/golang/
1.19.3/packages/pkg/mod/nhooyr.io/websocket@v1.8.7/conn_notjs.go:153 nhooyr.io/websocket.(*Conn).timeoutLoop (0x100694f54) [select]
  Goroutine 50 - User: /Users/user/.asdf/installs/golang/
1.19.3/go/src/runtime/sigqueue.go:149 os/signal.signal_recv (0x10037896c) (thread 23879806)
  Goroutine 51 - User: /Users/user/.asdf/installs/golang/
1.19.3/packages/pkg/mod/nhooyr.io/websocket@v1.8.7/conn_notjs.go:153 nhooyr.io/websocket.(*Conn).timeoutLoop (0x100694f54) [select]
  Goroutine 138 - User: /Users/user/.asdf/installs/golang/
1.19.3/packages/pkg/mod/github.com/philippseith/signalr@v0.6.0/client.go:168 github.com/philippseith/signalr.(*client).Start.func1.1 (0x1006c6648) [chan receive]
  Goroutine 146 - User: /Users/user/.asdf/installs/golang/
1.19.3/packages/pkg/mod/nhooyr.io/websocket@v1.8.7/conn_notjs.go:153 nhooyr.io/websocket.(*Conn).timeoutLoop (0x100694f54) [select]
  Goroutine 158 - User: /Users/user/.asdf/installs/golang/
1.19.3/packages/pkg/mod/nhooyr.io/websocket@v1.8.7/conn_notjs.go:153 nhooyr.io/websocket.(*Conn).timeoutLoop (0x100694f54) [select]
...

I am using this code:

    clntCtx, _ := context.WithCancel(context.Background())

    client, err := signalr.NewClient(
        clntCtx, nil,
        signalr.WithReceiver(rcv),
        signalr.WithConnector(func() (signalr.Connection, error) {
            var err error
            var conn signalr.Connection

            for i := 0; i < 10; i++ {
                log.Println("connecting...")

                //nolint
                creationCtx, _ := context.WithTimeout(context.Background(), 10*time.Second)
                conn, err = signalr.NewHTTPConnection(creationCtx, address)
                if err != nil {
                    log.Println(err.Error())

                    time.Sleep(30 * time.Second)

                    continue
                }

                break
            }

            log.Println("connected")

            fmt.Println("current gourtines", runtime.NumGoroutine())

            reInvoke <- 1

            return conn, err
        }),
        signalr.MaximumReceiveMessageSize(maxMessageSize),
        signalr.Logger(kitlog.NewLogfmtLogger(os.Stderr), false))
    if err != nil {
        log.Println(err.Error())
    }

Am I reconnecting in the wrong way?

Thanks in advance for your help.

x0y0z0tn commented 1 year ago

The WebSocket may not be closed after the reconnection.

The only way I found to close the Websocket was to cast the Connection to WebsocketConnection

                ws, ok := currConn.(*signalr.WebSocketConnection)
                if ok {
                    ws.Close()
                }

and to do that I added a Close method to the webSocketConnection and I changed to uppercase the first letter of the struct name: fromwebsocketConnection to WebsocketConnection.

diff --git a/client.go b/client.go
index 0ddb3e4..8b04ac0 100644
--- a/client.go
+++ b/client.go
@@ -270 +270 @@ func (c *client) setupConnectionAndProtocol() (hubProtocol, error) {
-               if wsConn, ok := c.conn.(*webSocketConnection); ok {
+               if wsConn, ok := c.conn.(*WebSocketConnection); ok {
diff --git a/websocketconnection.go b/websocketconnection.go
index 4d079e9..3d141d1 100644
--- a/websocketconnection.go
+++ b/websocketconnection.go
@@ -11 +11 @@ import (
-type webSocketConnection struct {
+type WebSocketConnection struct {
@@ -17,2 +17,2 @@ type webSocketConnection struct {
-func newWebSocketConnection(ctx context.Context, connectionID string, conn *websocket.Conn) *webSocketConnection {
-       w := &webSocketConnection{
+func newWebSocketConnection(ctx context.Context, connectionID string, conn *websocket.Conn) *WebSocketConnection {
+       w := &WebSocketConnection{
@@ -25 +25,6 @@ func newWebSocketConnection(ctx context.Context, connectionID string, conn *webs
-func (w *webSocketConnection) Write(p []byte) (n int, err error) {
+// Close closes the connection.
+func (w *WebSocketConnection) Close() error {
+       return w.conn.Close(websocket.StatusNormalClosure, "")
+}
+
+func (w *WebSocketConnection) Write(p []byte) (n int, err error) {
@@ -45 +50 @@ func (w *webSocketConnection) Write(p []byte) (n int, err error) {
-func (w *webSocketConnection) Read(p []byte) (n int, err error) {
+func (w *WebSocketConnection) Read(p []byte) (n int, err error) {
@@ -61 +66 @@ func (w *webSocketConnection) Read(p []byte) (n int, err error) {
-func (w *webSocketConnection) TransferMode() TransferMode {
+func (w *WebSocketConnection) TransferMode() TransferMode {
@@ -65 +70 @@ func (w *webSocketConnection) TransferMode() TransferMode {
-func (w *webSocketConnection) SetTransferMode(transferMode TransferMode) {
+func (w *WebSocketConnection) SetTransferMode(transferMode TransferMode) {

I don't know if I am in the correct way.

philippseith commented 1 year ago

Your are provoking the reconnect deliberately with creationCtx, _ := context.WithTimeout(context.Background(), 10*time.Second), right? In the end, this context goes into https://github.com/philippseith/signalr/blob/7237db33c21b454ced6c212f290cde3f5aaf87d7/websocketconnection.go#L48 (or into the corresponding Write) and cancels it. The websocket docs at https://pkg.go.dev/nhooyr.io/websocket@v1.8.7#Conn state that any error on any method will close the websocket connection. But this seems not to be the case when the context is canceled. For such cases, the ReadWriteWithContext wrapper has an unblock method parameter, which in case of the webSocketConnection is empty. https://github.com/philippseith/signalr/blob/7237db33c21b454ced6c212f290cde3f5aaf87d7/websocketconnection.go#L54 Can you try what happens if you pass func() { _ = w.conn.Close(1000, "") } instead?

x0y0z0tn commented 1 year ago

thanks for your answer.

Your are provoking the reconnect deliberately with creationCtx, _ := context.WithTimeout(context.Background(), 10*time.Second), right?

No, I am not provoking the reconnect deliberately, sometimes the server disconnects, and the reconnection starts (and don't have control over the server, I'm subscribed to the server with the client).

In the end, this context goes into

https://github.com/philippseith/signalr/blob/7237db33c21b454ced6c212f290cde3f5aaf87d7/websocketconnection.go#L48

I am not sure about that because, when you create a NewHTTPConnection you use one context (the creation of the connection context)

...
creationCtx, _ := context.WithTimeout(context.Background(), 10*time.Second)
conn, err = signalr.NewHTTPConnection(creationCtx, address)
...

and the WebSocket is created with another context

https://github.com/philippseith/signalr/blob/7237db33c21b454ced6c212f290cde3f5aaf87d7/httpconnection.go#L133

if I understood the code correctly the connection creation context is not related to this line

https://github.com/philippseith/signalr/blob/7237db33c21b454ced6c212f290cde3f5aaf87d7/websocketconnection.go#L48

philippseith commented 1 year ago

Upps, https://github.com/philippseith/signalr/blob/7237db33c21b454ced6c212f290cde3f5aaf87d7/httpconnection.go#L133 might be a bug, I think ctx should be passed. But that does not solve the problem. Because the reconnect is caused by a serverside problem, the websocket documentations statement "any error on any method closes the connection" seems not to be fully true and adding _ = w.conn.Close(1000, err.Error()) after https://github.com/philippseith/signalr/blob/7237db33c21b454ced6c212f290cde3f5aaf87d7/websocketconnection.go#L40 and https://github.com/philippseith/signalr/blob/7237db33c21b454ced6c212f290cde3f5aaf87d7/websocketconnection.go#L56 should be reasonable.

x0y0z0tn commented 1 year ago

thanks for your answer.

Yes, with your suggestion _ = w.conn.Close(1000, err.Error()) the goroutines don't increase after the reconnection.

Please tell me when you upload the change to apply it and monitor it during the next days.

Thanks again.

x0y0z0tn commented 1 year ago

I'm going to close the issue. With the change, locally everything looks fine, if during the next days I find anything weird related to it, I'm going to reopen the issue.

Thanks a lot!!