When an established connection is breaking for some reason (because the TCP Ack does not come within the TCP User Timeout, for example) re-establishing the connection is bound by the connection timeout - it just cannot happen faster. Consider the following scenario:
The connection is breaking at A side and A.1 sends the connect request to B.
B.2 receives the request, but the remote endpoint is still valid there, so it replies with the ConnectionRequestCrossed (code) and tries to probe the existing connection.
The probing fails and B.1 sends its connection request to A.
A.1 gets ConnectionRequestCrossed reply and gets stuck on the "resolved" MVar at findRemoteEndPoint until the connection timeout fires.
A.2 receives the connection request and replies with ConnectionRequestCrossed.
B.3 receives the ConnectionRequestCrossed reply and puts the "crossed" MVar.
The connection timeout fires on A.1, it releases the "resolved" MVar and removes the stale endpoint. The new connection request is sent to B.
B.2 immediately accepts the request (because the "crossed" MVar was already set).
Here is the log of the described scenario:
ssu1: May 07 11:27:05 ssu1 p[6344]: Network.Socket.ByteString.sendMany: total=188193 sent=100951
ssu1: May 07 11:27:05 ssu1 p[6344]: Network.Socket.ByteString.sendMany: total=87242 sent=65160
ssu1: May 07 11:27:06 ssu1 p[6344]: Network.Socket.ByteString.sendMany: total=44503 sent=26171
ssu1: May 07 11:27:06 ssu1 p[6344]: runScheduledAction: exception=ProcessLinkException pid://172.28.128.18:9070:0:485043 DiedNormal remote=172.28.128.19:9070:0 stack=["Network.Transport.TCP.runScheduledAction (src/Network/Transport/TCP.hs:(1946,1)-(1977,38))","Network.Transport.TCP.withScheduledAction.\\ (src/Network/Transport/TCP.hs:1985:34-71)","Network.Transport.TCP.withScheduledAction (src/Network/Transport/TCP.hs:(1983,1)-(1986,94))","Network.Transport.Internal.mapIOException (src/Network/Transport/Internal.hs:149:1-42)","Network.Transport.TCP.apiSend (src/Network/Transport/TCP.hs:(819,1)-(849,49))"]
ssu1: May 07 11:27:07 ssu1 p[6344]: Network.Socket.ByteString.sendMany: total=125734 sent=72780
ssu1: May 07 11:27:07 ssu1 p[6344]: handleIncomingMessages.prematureExit: err=user error (recvExact: Socket closed) st=RemoteEndPointValid
ssu1: May 07 11:27:07 ssu1 p[6344]: runScheduledAction: exception=Network.Socket.ByteString.sendMany: resource vanished (Broken pipe) remote=172.28.128.19:9070:0 stack=["Network.Transport.TCP.runScheduledAction (src/Network/Transport/TCP.hs:(1946,1)-(1977,38))","Network.Transport.TCP.withScheduledAction.\\ (src/Network/Transport/TCP.hs:1985:34-71)","Network.Transport.TCP.withScheduledAction (src/Network/Transport/TCP.hs:(1983,1)-(1986,94))","Network.Transport.TCP.apiSend (src/Network/Transport/TCP.hs:(819,1)-(849,49))","Network.Transport.TCP.apiConnect (src/Network/Transport/TCP.hs:(765,1)-(779,11))"]
ssu1: May 07 11:27:07 ssu1 p[6344]: runScheduledAction: remoteState=RemoteEndPointFailed: user error (recvExact: Socket closed)
ssu1: May 07 11:27:07 ssu1 p[6344]: runScheduledAction: exception=Network.Socket.ByteString.sendMany: timeout (Connection timed out) remote=172.28.128.19:9070:0 stack=["Network.Transport.TCP.runScheduledAction (src/Network/Transport/TCP.hs:(1946,1)-(1977,38))","Network.Transport.TCP.withScheduledAction.\\ (src/Network/Transport/TCP.hs:1985:34-71)","Network.Transport.TCP.withScheduledAction (src/Network/Transport/TCP.hs:(1983,1)-(1986,94))","Network.Transport.TCP.apiSend (src/Network/Transport/TCP.hs:(819,1)-(849,49))","Network.Transport.TCP.apiConnect (src/Network/Transport/TCP.hs:(765,1)-(779,11))"]
ssu1: May 07 11:27:07 ssu1 p[6344]: runScheduledAction: remoteState=RemoteEndPointFailed: Network.Socket.ByteString.sendMany: resource vanished (Broken pipe)
ssu1: May 07 11:27:07 ssu1 p[6344]: runScheduledAction: exception=Network.Socket.ByteString.sendMany: invalid argument (Bad file descriptor) remote=172.28.128.19:9070:0 stack=["Network.Transport.TCP.runScheduledAction (src/Network/Transport/TCP.hs:(1946,1)-(1977,38))","Network.Transport.TCP.withScheduledAction.\\ (src/Network/Transport/TCP.hs:1985:34-71)","Network.Transport.TCP.withScheduledAction (src/Network/Transport/TCP.hs:(1983,1)-(1986,94))","Network.Transport.TCP.apiSend (src/Network/Transport/TCP.hs:(819,1)-(849,49))","Network.Transport.TCP.apiConnect (src/Network/Transport/TCP.hs:(765,1)-(779,11))"]
ssu1: May 07 11:27:07 ssu1 p[6344]: runScheduledAction: remoteState=RemoteEndPointFailed: Network.Socket.ByteString.sendMany: timeout (Connection timed out)
ssu1: May 07 11:27:07 ssu1 p[6344]: resetIfBroken: remoteEP failed
ssu1: May 07 11:27:07 ssu1 p[6344]: removeRemoteEndPoint: 172.28.128.19:9070:0
ssu1: May 07 11:27:07 ssu1 p[6344]: createConnectionTo.go: ourAddr=172.28.128.18:9070:0 theirAddr=172.28.128.19:9070:0
ssu1: May 07 11:27:07 ssu1 p[6344]: findRemoteEndPoint: (addr, new?, id)=(172.28.128.19:9070:0,True,18)
ssu1: May 07 11:27:07 ssu1 p[6344]: socketToEndPoint: AddrInfo {addrFlags = [AI_ADDRCONFIG,AI_V4MAPPED], addrFamily = AF_INET, addrSocketType = Stream, addrProtocol = 6, addrAddress = 172.28.128.19:9070, addrCanonName = Nothing}
ssu1: May 07 11:27:07 ssu1 p[6344]: createConnectionTo.go: ourAddr=172.28.128.18:9070:0 theirAddr=172.28.128.19:9070:0
ssu1: May 07 11:27:07 ssu1 p[6344]: createConnectionTo.go: ourAddr=172.28.128.18:9070:0 theirAddr=172.28.128.19:9070:0
ssu1: May 07 11:27:07 ssu1 p[6344]: createConnectionTo.go: ourAddr=172.28.128.18:9070:0 theirAddr=172.28.128.19:9070:0
ssu1: May 07 11:27:07 ssu1 p[6344]: createConnectionTo.go: ourAddr=172.28.128.18:9070:0 theirAddr=172.28.128.19:9070:0
ssu5: May 07 11:27:08 ssu5 p[6375]: findRemoteEndPoint: (addr, new?, id)=(172.28.128.18:9070:0,False,2)
ssu5: May 07 11:27:08 ssu5 p[6375]: handleConnectionRequest.go: ConnectionRequestCrossed: 172.28.128.18:60992
ssu5: May 07 11:27:08 ssu5 p[6375]: ProbeSocket: 172.28.128.18:9070:0
ssu5: May 07 11:27:08 ssu5 p[6375]: handleIncomingMessages.prematureExit: err=Network.Socket.recvBuf: invalid argument (Bad file descriptor) st=RemoteEndPointValid
ssu5: May 07 11:27:08 ssu5 p[6375]: resetIfBroken: remoteEP failed
ssu5: May 07 11:27:08 ssu5 p[6375]: removeRemoteEndPoint: 172.28.128.18:9070:0
ssu5: May 07 11:27:08 ssu5 p[6375]: createConnectionTo.go: ourAddr=172.28.128.19:9070:0 theirAddr=172.28.128.18:9070:0
ssu5: May 07 11:27:08 ssu5 p[6375]: findRemoteEndPoint: (addr, new?, id)=(172.28.128.18:9070:0,True,17)
ssu5: May 07 11:27:08 ssu5 p[6375]: socketToEndPoint: AddrInfo {addrFlags = [AI_ADDRCONFIG,AI_V4MAPPED], addrFamily = AF_INET, addrSocketType = Stream, addrProtocol = 6, addrAddress = 172.28.128.18:9070, addrCanonName = Nothing}
ssu5: May 07 11:27:08 ssu5 p[6375]: createConnectionTo.go: ourAddr=172.28.128.19:9070:0 theirAddr=172.28.128.18:9070:0
ssu5: May 07 11:27:08 ssu5 p[6375]: createConnectionTo.go: ourAddr=172.28.128.19:9070:0 theirAddr=172.28.128.18:9070:0
ssu1: May 07 11:27:08 ssu1 p[6344]: createConnectionTo.go: ourAddr=172.28.128.18:9070:0 theirAddr=172.28.128.19:9070:0
ssu1: May 07 11:27:08 ssu1 p[6344]: createConnectionTo.go: ourAddr=172.28.128.18:9070:0 theirAddr=172.28.128.19:9070:0
ssu1: May 07 11:27:08 ssu1 p[6344]: findRemoteEndPoint: (addr, new?, id)=(172.28.128.19:9070:0,False,18)
ssu1: May 07 11:27:08 ssu1 p[6344]: handleConnectionRequest.go: ConnectionRequestCrossed: 172.28.128.19:36454
-------------------- connectTimeout is fired !!!
ssu1: May 07 11:27:09 ssu1 p[6344]: createConnectionTo.go: removeRemoteEndPoint 172.28.128.19:9070:0
ssu1: May 07 11:27:09 ssu1 p[6344]: removeRemoteEndPoint: 172.28.128.19:9070:0
ssu1: May 07 11:27:09 ssu1 p[6344]: findRemoteEndPoint: (addr, new?, id)=(172.28.128.19:9070:0,True,19)
ssu1: May 07 11:27:09 ssu1 p[6344]: socketToEndPoint: AddrInfo {addrFlags = [AI_ADDRCONFIG,AI_V4MAPPED], addrFamily = AF_INET, addrSocketType = Stream, addrProtocol = 6, addrAddress = 172.28.128.19:9070, addrCanonName = Nothing}
ssu1: May 07 11:27:09 ssu1 p[6344]: findRemoteEndPoint: (addr, new?, id)=(172.28.128.19:9070:0,False,19)
ssu1: May 07 11:27:09 ssu1 p[6344]: findRemoteEndPoint: (addr, new?, id)=(172.28.128.19:9070:0,False,19)
ssu1: May 07 11:27:09 ssu1 p[6344]: findRemoteEndPoint: (addr, new?, id)=(172.28.128.19:9070:0,False,19)
ssu1: May 07 11:27:09 ssu1 p[6344]: createConnectionTo.go: ourAddr=172.28.128.18:9070:0 theirAddr=172.28.128.19:9070:0
ssu1: May 07 11:27:09 ssu1 p[6344]: createConnectionTo.go: 172.28.128.19:9070:0 connId=1024
ssu1: May 07 11:27:09 ssu1 p[6344]: createConnectionTo.go: 172.28.128.19:9070:0 connId=1025
ssu1: May 07 11:27:09 ssu1 p[6344]: findRemoteEndPoint: (addr, new?, id)=(172.28.128.19:9070:0,False,19)
ssu1: May 07 11:27:09 ssu1 p[6344]: createConnectionTo.go: 172.28.128.19:9070:0 connId=1026
ssu1: May 07 11:27:09 ssu1 p[6344]: handleIncomingMessages: got CreatedNewConnection from 172.28.128.19:9070:0
ssu1: May 07 11:27:09 ssu1 p[6344]: createConnectionTo.go: 172.28.128.19:9070:0 connId=1027
ssu1: May 07 11:27:09 ssu1 p[6344]: Network.Socket.ByteString.sendMany: total=125734 sent=82536
ssu1: May 07 11:27:09 ssu1 p[6344]: Network.Socket.ByteString.sendMany: total=43198 sent=10136
ssu1: May 07 11:27:09 ssu1 p[6344]: handleIncomingMessages: got CreatedNewConnection from 172.28.128.19:9070:0
ssu5: May 07 11:27:09 ssu5 p[6375]: findRemoteEndPoint: (addr, new?, id)=(172.28.128.18:9070:0,True,17)
ssu5: May 07 11:27:09 ssu5 p[6375]: handleConnectionRequest.go: ConnectionRequestAccepted: 172.28.128.18:60994
ssu5: May 07 11:27:09 ssu5 p[6375]: findRemoteEndPoint: (addr, new?, id)=(172.28.128.18:9070:0,False,17)
ssu5: May 07 11:27:09 ssu5 p[6375]: findRemoteEndPoint: (addr, new?, id)=(172.28.128.18:9070:0,False,17)
ssu5: May 07 11:27:09 ssu5 p[6375]: handleIncomingMessages: got CreatedNewConnection from 172.28.128.18:9070:0
ssu5: May 07 11:27:09 ssu5 p[6375]: createConnectionTo.go: 172.28.128.18:9070:0 connId=1024
ssu5: May 07 11:27:09 ssu5 p[6375]: createConnectionTo.go: 172.28.128.18:9070:0 connId=1025
ssu5: May 07 11:27:09 ssu5 p[6375]: handleIncomingMessages: got CreatedNewConnection from 172.28.128.18:9070:0
ssu5: May 07 11:27:09 ssu5 p[6375]: handleIncomingMessages: got CreatedNewConnection from 172.28.128.18:9070:0
ssu5: May 07 11:27:09 ssu5 p[6375]: handleIncomingMessages: got CreatedNewConnection from 172.28.128.18:9070:0
ssu5: May 07 11:27:09 ssu5 p[6375]: Network.Socket.ByteString.sendMany: total=125661 sent=86880
ssu1: May 07 11:27:11 ssu1 p[6344]: Network.Socket.ByteString.sendMany: total=188257 sent=86880
ssu1: May 07 11:27:11 ssu1 p[6344]: Network.Socket.ByteString.sendMany: total=188193 sent=153488
ssu1: May 07 11:27:11 ssu1 p[6344]: Network.Socket.ByteString.sendMany: total=188257 sent=157832
ssu1: May 07 11:27:14 ssu1 p[6344]: Network.Socket.ByteString.sendMany: total=273038 sent=218648
In this case, the connection timeout (connectTimeout) was 2 seconds.
When an established connection is breaking for some reason (because the TCP Ack does not come within the TCP User Timeout, for example) re-establishing the connection is bound by the connection timeout - it just cannot happen faster. Consider the following scenario:
Here is the log of the described scenario:
In this case, the connection timeout (connectTimeout) was 2 seconds.
Is this a bug or it's how it is supposed to work?
Originally posted by @andriytk in https://github.com/haskell-distributed/network-transport-tcp/issues/74#issuecomment-490480703