philippseith / signalr

SignalR server and client in go
MIT License
140 stars 40 forks source link

Client disconnects if different KeepAliveInterval is set #179

Closed renato-kristic closed 1 year ago

renato-kristic commented 1 year ago

I'm using the latest release (0.6.3) for my client and server. I defined the following values for both, server and client:

signalr.KeepAliveInterval(15*time.Second),
signalr.TimeoutInterval(30*time.Second),

With these parameters I can see that client disconnects/reconnects every 30 seconds.

This is client log:

level=debug ts=2023-09-28T07:15:31.230320993Z class=Client connection="qIYFdt3w6zclS30fjUC2Qw==" hub=test.Client ts=2023-09-28T07:15:31.230321394Z caller=loop.go:111 event="message received" message="\"Sent ping\""
level=debug ts=2023-09-28T07:15:31.230359326Z protocol=JSON ts=2023-09-28T07:15:31.230359676Z caller=jsonhubprotocol.go:211 event=write message="{\"type\":6}\u001e"
level=debug ts=2023-09-28T07:15:36.231017864Z class=Client connection="qIYFdt3w6zclS30fjUC2Qw==" hub=test.Client ts=2023-09-28T07:15:36.231018636Z caller=loop.go:109 event="message received" message="\"Is lastWriteTimestamp 5.000653559s > keepAliveInterval 5s\""
level=debug ts=2023-09-28T07:15:36.231082206Z class=Client connection="qIYFdt3w6zclS30fjUC2Qw==" hub=test.Client ts=2023-09-28T07:15:36.231082627Z caller=loop.go:111 event="message received" message="\"Sent ping\""
level=debug ts=2023-09-28T07:15:36.231139054Z protocol=JSON ts=2023-09-28T07:15:36.231139464Z caller=jsonhubprotocol.go:211 event=write message="{\"type\":6}\u001e"
level=debug ts=2023-09-28T07:15:41.23189374Z class=Client connection="qIYFdt3w6zclS30fjUC2Qw==" hub=test.Client ts=2023-09-28T07:15:41.231894482Z caller=loop.go:109 event="message received" message="\"Is lastWriteTimestamp 5.000765026s > keepAliveInterval 5s\""
level=debug ts=2023-09-28T07:15:41.231947713Z class=Client connection="qIYFdt3w6zclS30fjUC2Qw==" hub=test.Client ts=2023-09-28T07:15:41.231948103Z caller=loop.go:111 event="message received" message="\"Sent ping\""
level=debug ts=2023-09-28T07:15:41.231991586Z protocol=JSON ts=2023-09-28T07:15:41.231991926Z caller=jsonhubprotocol.go:211 event=write message="{\"type\":6}\u001e"
level=debug ts=2023-09-28T07:15:46.227611738Z protocol=JSON ts=2023-09-28T07:15:46.22761234Z caller=jsonhubprotocol.go:211 event=write message="{\"type\":7,\"error\":\"timeout interval elapsed (30s)\",\"allowReconnect\":false}\u001e"
level=debug ts=2023-09-28T07:15:46.227792982Z class=Client connection="qIYFdt3w6zclS30fjUC2Qw==" hub=test.Client ts=2023-09-28T07:15:46.227793453Z caller=loop.go:135 event="message loop ended"
level=info ts=2023-09-28T07:15:46.227858125Z connect="timeout interval elapsed (30s)"
level=debug ts=2023-09-28T07:15:46.780683047Z caller=client.go:318 state=1
level=debug ts=2023-09-28T07:15:46.782063894Z class=Client connection="RRa4hqt-650YUQ5r5QlVDw==" hub=test.Client ts=2023-09-28T07:15:46.782064435Z caller=client.go:586 event="handshake sent" msg="{\"protocol\":\"json\",\"version\":1}\u001e"
level=debug ts=2023-09-28T07:15:46.782313036Z class=Client connection="RRa4hqt-650YUQ5r5QlVDw==" hub=test.Client ts=2023-09-28T07:15:46.782313707Z caller=client.go:616 event="handshake received" msg="signalr.handshakeResponse{Error:\"\"}"

This is server log:

Sep 28 09:21:15 nsoft-ubuntu test[15604]: level=debug ts=2023-09-28T07:21:15.53896606Z protocol=JSON caller=jsonhubprotocol.go:76 event=read message="{\"type\":6}"
Sep 28 09:21:15 nsoft-ubuntu test[15604]: level=debug ts=2023-09-28T07:21:15.539036614Z class=Server connection="T5_fdA7vxsj3N-hY190zcw==" hub=test.TestHub caller=loop.go:285 event="message received" message=signalr.hubMessage{Type:6}
Sep 28 09:21:20 nsoft-ubuntu test[15604]: level=debug ts=2023-09-28T07:21:20.539399179Z protocol=JSON caller=jsonhubprotocol.go:76 event=read message="{\"type\":6}"
Sep 28 09:21:20 nsoft-ubuntu test[15604]: level=debug ts=2023-09-28T07:21:20.539524676Z class=Server connection="T5_fdA7vxsj3N-hY190zcw==" hub=test.TestHub caller=loop.go:285 event="message received" message=signalr.hubMessage{Type:6}
Sep 28 09:21:25 nsoft-ubuntu test[15604]: level=debug ts=2023-09-28T07:21:25.539990055Z protocol=JSON caller=jsonhubprotocol.go:76 event=read message="{\"type\":6}"
Sep 28 09:21:25 nsoft-ubuntu test[15604]: level=debug ts=2023-09-28T07:21:25.540103991Z class=Server connection="T5_fdA7vxsj3N-hY190zcw==" hub=test.TestHub caller=loop.go:285 event="message received" message=signalr.hubMessage{Type:6}
Sep 28 09:21:30 nsoft-ubuntu test[15604]: level=debug ts=2023-09-28T07:21:30.537202984Z protocol=JSON caller=jsonhubprotocol.go:76 event=read message="{\"type\":7,\"error\":\"timeout interval elapsed (30s)\",\"allowReconnect\":false}"
Sep 28 09:21:30 nsoft-ubuntu test[15604]: level=debug ts=2023-09-28T07:21:30.537295519Z class=Server connection="T5_fdA7vxsj3N-hY190zcw==" hub=test.TestHub caller=loop.go:93 message="signalr.closeMessage{Type:7, Error:\"timeout interval elapsed (30s)\", AllowReconnect:false}"
Sep 28 09:21:30 nsoft-ubuntu test[15604]: level=debug ts=2023-09-28T07:21:30.537335675Z protocol=JSON caller=jsonhubprotocol.go:211 event=write message="{\"type\":7,\"error\":\"timeout interval elapsed (30s)\",\"allowReconnect\":true}\u001e"
Sep 28 09:21:30 nsoft-ubuntu test[15604]: level=debug ts=2023-09-28T07:21:30.537429824Z class=Server connection="T5_fdA7vxsj3N-hY190zcw==" hub=test.TestHub caller=loop.go:135 event="message loop ended"
Sep 28 09:21:31 nsoft-ubuntu test[15604]: level=debug ts=2023-09-28T07:21:31.090958238Z class=Server connection="zP1kWbUFuLmrwccs4Ou7Zg==" hub=test.TestHub caller=server.go:219 event="handshake received" msg="{\"protocol\":\"json\",\"version\":1}"
Sep 28 09:21:31 nsoft-ubuntu test[15604]: level=debug ts=2023-09-28T07:21:31.091105638Z class=Server connection="zP1kWbUFuLmrwccs4Ou7Zg==" hub=test.TestHub caller=server.go:240 event="handshake sent" msg="{}\u001e"

The only solution I found is to set KeepAliveInterval=5s on the server. If I try to set it to 4s or 6s it will still cause client to disconnect.

Any idea what is causing these issues?

btw, in comments I see this https://github.com/philippseith/signalr/blob/e607000614fa7d709688494ec3918d0a5428fec1/options.go#L39 But in newPartyBasemethod I see this hardcoded: https://github.com/philippseith/signalr/blob/e607000614fa7d709688494ec3918d0a5428fec1/party.go#L62 Could this cause any issues?

renato-kristic commented 1 year ago

Hmmm, something is wrong with ping mechanism in 0.6.3. I removed signalr.KeepAliveInterval(15*time.Second) and signalr.TimeoutInterval(30*time.Second) from constructor and default values are now used for my server and client.

If there is only one client everything is fine, I cannot see any timeout interval elapsed (30s) messages in log. But if there are two clients ping mechanism starts behaving strange.

If both clients are not sending any messages (only ping messages are exchanging) everything is fine. But when one client is sending messages other client will disconnect every 30s.

For example: Client1 -> Sending message every 2 seconds. Client2 -> Not sending anything, just listening. Client2 will be disconnected and will try to connect again.

Client2 log:

level=debug ts=2023-10-03T07:11:31.76561318Z protocol=JSON ts=2023-10-03T07:11:31.765614282Z caller=jsonhubprotocol.go:76 event=read message="{\"type\":7,\"error\":\"timeout interval elapsed (30s)\",\"allowReconnect\":true}"
level=debug ts=2023-10-03T07:11:31.765838136Z class=Client connection="QwWocpbJqbh16ZWETdzfhA==" hub=test_client.Client ts=2023-10-03T07:11:31.765838427Z caller=loop.go:96 message="signalr.closeMessage{Type:7, Error:\"timeout interval elapsed (30s)\", AllowReconnect:true}"
level=debug ts=2023-10-03T07:11:31.765872922Z protocol=JSON ts=2023-10-03T07:11:31.765873233Z caller=jsonhubprotocol.go:211 event=write message="{\"type\":7,\"error\":\"timeout interval elapsed (30s)\",\"allowReconnect\":false}\u001e"
level=debug ts=2023-10-03T07:11:31.765917526Z class=Client connection="QwWocpbJqbh16ZWETdzfhA==" hub=test_client.Client ts=2023-10-03T07:11:31.765917847Z caller=loop.go:139 event="message loop ended"
level=info ts=2023-10-03T07:11:31.765962732Z connect="timeout interval elapsed (30s)"

level=debug ts=2023-10-03T07:11:32.48676235Z caller=client.go:318 state=1
level=debug ts=2023-10-03T07:11:32.487957522Z class=Client connection="BcARsZdZzjSzPJV4fbuUpA==" hub=test_client.Client ts=2023-10-03T07:11:32.487958243Z caller=client.go:586 event="handshake sent" msg="{\"protocol\":\"json\",\"version\":1}\u001e"
level=debug ts=2023-10-03T07:11:32.488239947Z class=Client connection="BcARsZdZzjSzPJV4fbuUpA==" hub=test_client.Client ts=2023-10-03T07:11:32.488240528Z caller=client.go:616 event="handshake received" msg="signalr.handshakeResponse{Error:\"\"}"
level=debug ts=2023-10-03T07:11:32.488323906Z caller=client.go:318 state=2

If I start third client (Python client 0.9.5) it will behave correctly and will not disconnect.

Any idea what is wrong with ping mechanism?

renato-kristic commented 1 year ago

I tried to find a source of this problem and found out the following: Server resets timeoutTickerevery time new message arrives (I can see this happening every few seconds as there is flow of messages): https://github.com/philippseith/signalr/blob/e607000614fa7d709688494ec3918d0a5428fec1/loop.go#L80

But no matter if timeoutTickeris reset just few seconds ago, this will be executed: https://github.com/philippseith/signalr/blob/e607000614fa7d709688494ec3918d0a5428fec1/loop.go#L113

But it should not execute.... right?

renato-kristic commented 1 year ago

I found solution for this problem: https://github.com/philippseith/signalr/pull/180

If Client1 is sending messages and Client2 is just listening for messages, Client2 will not send ping after keepAliveInterval. By using NewTicker created outside of the loop that takes care of firing at keepAliveInterval it solved this problem.

gfrancois-sandbox commented 1 year ago

Hello,

I agree I had the same issue and apply the same fix.

philippseith commented 1 year ago

I tried to find a source of this problem and found out the following: Server resets timeoutTickerevery time new message arrives (I can see this happening every few seconds as there is flow of messages):

https://github.com/philippseith/signalr/blob/e607000614fa7d709688494ec3918d0a5428fec1/loop.go#L80

But no matter if timeoutTickeris reset just few seconds ago, this will be executed:

https://github.com/philippseith/signalr/blob/e607000614fa7d709688494ec3918d0a5428fec1/loop.go#L113

But it should not execute.... right?

It seems to me that these are different timeoutTicker instances. Both server and client are using them.

I found solution for this problem: #180

If Client1 is sending messages and Client2 is just listening for messages, Client2 will not send ping after keepAliveInterval. By using NewTicker created outside of the loop that takes care of firing at keepAliveInterval it solved this problem.

This "solves" the problem, yes. But only because the select case for keepAliveTicker is executing even the timeoutTicker has elapsed.

IMHO, the real problem is that the client does not reset its own timeout when it sends a Ping. You may have noticed that it is not the server which is closing the connection. It is the client! This roots in my sloppy reading of the spec: https://github.com/dotnet/aspnetcore/blob/main/src/SignalR/docs/specs/HubProtocol.md#ping-aka-keep-alive says: "Most implementations will want to reset a timeout used to determine if the other party is present." This is currently not done in this implementation. See #181

renato-kristic commented 1 year ago

It seems to me that these are different timeoutTicker instances. Both server and client are using them. This "solves" the problem, yes. But only because the select case for keepAliveTicker is executing even the timeoutTicker has elapsed. IMHO, the real problem is that the client does not reset its own timeout when it sends a Ping. You may have noticed that it is not the server which is closing the connection. It is the client! From what I can see in logs, it is the server that breaks pingLoop as timeoutTicker elapses.

Yes, these are different timeoutTicker instances. There is a new instance for every new client connected. During a test I was not aware of it. This is what was happening:

Client1 - Sends message every 2s Client2 - Just listens for messages

Client2 successfully receives messages that Client1 is sending but Client2 will not send pings if Client1 is emitting messages. Server stops receiving ping messages from Client2 and timeoutTicker instance for this client never gets reset. When timeoutTicker elapses it will break pingLoop and client will be disconnected.

The main source of this problem is that Client2 stops sending ping messages. But if I implement timer as described in https://github.com/philippseith/signalr/pull/180 everything starts behaving normally. Client2 will continue sending pings event if Client1 is emitting messages. For me this looks like good flow of messages.

This roots in my sloppy reading of the spec: https://github.com/dotnet/aspnetcore/blob/main/src/SignalR/docs/specs/HubProtocol.md#ping-aka-keep-alive says: "Most implementations will want to reset a timeout used to determine if the other party is present." This is currently not done in this implementation. See https://github.com/philippseith/signalr/pull/181

I tried implementing server and clients using your fix and I can confirm that it works too. Client2 will not get disconnected if Client1 is sending messages. I can see that server sends ping every 5s and resets timeout timer successfully: https://github.com/philippseith/signalr/blob/729f03be0de1dfc1bb80e17f3c06be2dfffacba2/loop.go#L115 Client stops sending pings but client will not be disconnected by server as server resets timeoutTicker when sending ping to the client.

For me this approach is acceptable. I hope there are no cons for using this ping mechanism.

philippseith commented 1 year ago

Fixed by #181