Open aienabled opened 4 years ago
Hi, we are also investigating the connection timeout. Looking at the code, I noticed that the deadline is extended only when it receives the PONG from its PING (client). In my opinion it should also be extended in the following cases:
1) I get a PING from the server 2) I receive any packet from the server
What do you think?
P.S aienabled yes we have extended the deadline also within this if if ((byte) pongNumber! = (byte) m_sentPingNumber)
@kenfalco
BTW, since reporting the issue I have not tried any changes in the ReceivedPong
method. I'm considering trying them now but there is one observation that makes me confused. We have few users with ideal connections (ping around 30 ms and no packet loss, as indicated by WinMTR tool) which are getting disconnects consistently—often to all the servers we have around the world! So it doesn't seem to be relevant to the ping result handling as ping packets are sent relatively rarely and should be arriving in the correct order due to a low ping and no packet loss.
Hi, I checked and I have an old version of Lidgren (2014) and I didn't have this line of code.
m_connection.ResetTimeout (now);
I have analyzed the differences between my code and new Lidgren and there is this thing that I don't like:
lock (m_connections) { foreach (NetConnection conn in m_connections) { conn.Heartbeat(now, m_frameCounter); if (conn.m_status == NetConnectionStatus.Disconnected) { // // remove connection // m_connections.Remove(conn); m_connectionLookup.Remove(conn.RemoteEndPoint); break; // can't continue iteration here } } }
Why break if you remove a connection? For example, if you find a connection halfway through each time, it may take a long time to process some connections and maybe can time out.
This is the code in my version:
lock (m_connections) { for (int i = m_connections.Count - 1; i >= 0; i--) { var conn = m_connections[i]; conn.Heartbeat(now, m_frameCounter); if (conn.m_status == NetConnectionStatus.Disconnected) { // // remove connection // m_connections.RemoveAt(i); m_connectionLookup.Remove(conn.RemoteEndPoint); } } }
It seems more correct to me.
P.S We are investigating why with 11,000+ users at the same time we have group disconnections of about 1000-2000 users. we don't know if can be a hardware or software problem.
@kenfalco I've checked the master branch in this repository and there is no break
in the connections heartbeat foreach loop https://github.com/lidgren/lidgren-network-gen3/blob/master/Lidgren.Network/NetPeer.Internal.cs#L374
I see, so your issue is different. Not some random user disconnecting but massive disconnects. I think this chart alone (online stats) is not enough to make an educated guess on what could be a culprit. The connections drop could be related to:
It's best to gather more metrics in your application (+GC events), OS, and hardware (CPU/RAM/network).
Yes you are right I was looking at another version of Lidgren.
We have analyzed lidgren's logs and the only message that often appears in the most busy hours is:
Socket exception: System.Net.Sockets.SocketException (0x80004005): The connection has been broken due to keep-alive activity detecting a failure while the operation was in progress at System.Net.Sockets.Socket.ReceiveFrom (Byte [] buffer, Int32 offset, Int32 size, SocketFlags socketFlags, EndPoint & remoteEP) at Lidgren.Network.NetPeer.Heartbeat () in D: \ Workswc \ CardGames \ trunk \ Online Payment Server \ Lidgren.Network \ NetPeer.Internal.cs: line 432
As soon as possible we will update Lidgren to the latest version.
@kenfalco I've found a mention that it's related to the packet TTL expiration (even in the case of UDP packets):
For a datagram socket, this error indicates that the time to live has expired.
You can try increasing packet TTL—for example m_socket.Ttl = 255;
put after this line:
https://github.com/lidgren/lidgren-network-gen3/blob/3ab2d8d36867ccd9e10d7b1c61ed4ccc8ce5172f/Lidgren.Network/NetPeer.Internal.cs#L133
will increase it to max possible.
I would also recommend temporarily relocating to a new hardware in another datacenter just to ensure the issue is not related to the hardware/network failure.
Hi, thanks for the advice i will try it. P.S I tried to download the latest version of Lidgren but it doesn't compile on IOS. First he gave this error:
https://github.com/lidgren/lidgren-network-gen3/issues/114
And I fixed it as they say at the end of the post.
Then he started giving this other ...
NullReferenceException: Object reference not set to an instance of an object. at Lidgren.Network.NetPeer.InitializeNetwork () [0x00000] in <00000000000000000000000000000000>: 0 at Lidgren.Network.NetPeer.Start () [0x00000] in <00000000000000000000000000000000>: 0 at Menu.Update () [0x00000] in <00000000000000000000000000000000>: 0
But is Unity + IOS no longer supported?
@kenfalco I cannot provide a response as I'm not the library developer and not developing any applications for iOS. My contributor bage is only because I've submitted a pull requested and it was accepted.
Hello!
While investigating random timeout connection issues (on perfect connections) I've noticed that
ReceivedPong
could handle only the latest ping request (m_sentPingNumber
), which is certainly not perfect in some cases https://github.com/lidgren/lidgren-network-gen3/blob/c18c8b96329772f49ec43362d4a219c1820870d6/Lidgren.Network/NetConnection.Latency.cs#L90I've also noticed that
NetConnection.ResetTimeout()
call in theNetReliableSenderChannel
to not working as expected...the client is dropping connection by timeout (set as 8 seconds) in some rare but 100% reproducible (for some clients) cases. I will keep investigating this issue.Regards!