space-wizards / SpaceWizards.Lidgren.Network

Lidgren Network Library (Space Wizards edition)
https://discord.ss14.io/
MIT License
50 stars 14 forks source link

Incorrect handling of SendTo() return value #6

Closed Partmedia closed 2 years ago

Partmedia commented 2 years ago

Several SS14 players have reported on Discord and elsewhere that they sometimes get stuck when connecting to the official servers at the "Connection Established" window, where the client seems to hang. Because this Heisenbug is difficult to reproduce, it seems to have evaded detection and squashing. I believe I have finally chased down this bug.

Steps to Reproduce

  1. Get a server with a very slow upstream connection.
  2. Have a friend connect to your server.
  3. Sometimes observe this error on the server console:
[ERRO] net: "0.0.0.0": "Failed to send packet: System.Net.Sockets.SocketException (55): No buffer space available
   at System.Net.Sockets.Socket.UpdateStatusAfterSocketErrorAndThrowException(SocketError error, String callerName)
   at System.Net.Sockets.Socket.SendTo(Byte[] buffer, Int32 offset, Int32 size, SocketFlags socketFlags, EndPoint remoteEP)
   at Lidgren.Network.NetPeer.ActuallySendPacket(Byte[] data, Int32 numBytes, IPEndPoint target, Boolean& connectionReset) in /space-station-14/RobustToolbox/Lidgren.Network/Lidgren.Network/Lidgren.Network/NetPeer.LatencySimulation.cs:line 212"

What's happening here is that SendTo() is returning less bytes than the caller wanted it to send, which, for very good reasons, can happen on Windows and Unix.

Specifically, the OS guesstimates a good size for the transmit buffer, and if this buffer gets filled, the OS tells the application that it has only done a partial send. The bug is that it is up to the caller, i.e. the application, to attempt to re-send those bits.

Workaround

Fix Basically, the problem is this:

int bytesSent = NetFastSocket.SendTo(m_socket, data, 0, numBytes, SocketFlags.None, realTarget);
if (numBytes != bytesSent)
    LogWarning("Failed to send the full " + numBytes + "; only " + bytesSent + " bytes sent in packet!");

Lidgren must attempt to re-send any bytes that didn't get sent as a result of an incomplete SendTo().

PJB3005 commented 2 years ago

After a discussion in Discord we weren't sure what the deal with this ENOBUFS is. After pouring over man pages and POSIX specs we basically came to the conclusion it's just a random code the OS can throw in some undefined circumstances, so I don't know we can really do anything about it.

The actual connection problem mentioned was finally diagnosed to be an MTU issue, fixed in 39f967e1cd4115b316fa2cc4d0668e60b231dba9.