lsalzman / enet

ENet reliable UDP networking library
MIT License
2.72k stars 670 forks source link

Catastrophic performance drop for long reliable queues #218

Closed mirasrael closed 1 year ago

mirasrael commented 1 year ago

Problem:

We have the issue in our project with catastrophic ENet performance drop when clients with poor connection enters the world. It gets even worse when server is modded and client downloads big mod files via ENet. I know ENet isn't intended for that, but this situation happens even if you use lot of small packets, but they doesn't deliver fast enough.

The issue with enet_protocol_check_outgoing_commands and commands exceeding windows. Assume you have have 32Mb data size (max supported by ENet), it will split it to about 32 000 fragments and with windows size 64 Kb it may only keep 64 fragments in fly, all other packets it have to skip, doing about 32 000 skips every time when it call enet_protocol_check_outgoing_commands. If you have another peer with active data transfer it will call enet_protocol_check_outgoing_commands on all peers as long as it continue sending data (if it has data more than for one MTU). I.e. if you need to make 5 MTU deliveries then it will make 5 * 32 000 worthless checks on packets skip for the peer exceeding it's window. It slow down all transfers and increases chance to overlap with another client who also may start downloading big packet and then it will be 64 000 worthless scan for every continueSending iteration etc. All that end up in ENet.Service doing almost nothing else than worthless wrap checks.

I implemented solution to that problem in our fork of ENet-CSharp repository - https://github.com/StrangeLoopGames/ENet-CSharp/pull/4. It have too many differences with enet main repo, so I can't just open PR. Also on second thought I decided it will be better to implement it in slightly different way.

How I fixed it:

I added another list of pendingOutgoingCommands for commands which fails window check. If outgoingCommand fails the check it moves to that queue and then we assume all commands in that queue will also fail the check and it is safe to not scan rest of items. For delivery we first check pendingOutgoingCommands until first fail and then continue with main queue. Instead of 32 000 checks we usually can do just one in that case reducing complexity from O(N) to O(1).

But this solution need to check these two queues everywhere.

This is how Enet.Service thread looks before/after optimization for just a single user, for multiple users it much worse: image

How it may be fixed better way:

For user reliable fragments they may go to pendingOutgoingCommands queue first (probably even without reliableSequenceNumber to not mess up with service messages). check_outgoing_commands scans main queue first and when it ends it pulls from pendingOutgoingCommands (and assigns reliableSequenceNumber) until window exceed. If window is full then it won't even try to get anything else. Then you have basically same logic everywhere with just main queue, but only have to process pending queue in check outgoing packets (and for diconnect later of course).

Another optimization may be done for continue sending. Instead of scanning all peers you may have an array of peer indices (which only need to rebuild when peer connect/disconnect). Then you have range of active peers in that array (0..N), when peer continue sending then it may exchange it's index in array with first peer which doesn't continue sending (like quick sort works). You have write index and swap peer index value at write index. After the pass writeIndex will be a number of peers which have payload, so you will need to scan just few of them instead of repeating scan of hundreds of peers.

lsalzman commented 1 year ago

I implemented my own strategy for maintaining a separate queue of send reliable packets that should theoretically alleviate this.

On Fri, Feb 3, 2023 at 2:55 AM Alexander Bondarev @.***> wrote:

Problem:

We have the issue in our project with catastrophic ENet performance drop when clients with poor connection enters the world. It gets even worse when server is modded and client downloads big mod files via ENet. I know ENet isn't intended for that, but this situation happens even if you use lot of small packets, but they doesn't deliver fast enough.

The issue with enet_protocol_check_outgoing_commands and commands exceeding windows. Assume you have have 32Mb data size (max supported by ENet), it will split it to about 32 000 fragments and with windows size 64 Kb it may only keep 64 fragments in fly, all other packets it have to skip, doing about 32 000 skips every time when it call enet_protocol_check_outgoing_commands. If you have another peer with active data transfer it will call enet_protocol_check_outgoing_commands on all peers as long as it continue sending data (if it has data more than for one MTU). I.e. if you need to make 5 MTU deliveries then it will make 5 * 32 000 worthless checks on packets skip for the peer exceeding it's window. It slow down all transfers and increases chance to overlap with another client who also may start downloading big packet and then it will be 64 000 worthless scan for every continueSending iteration etc. All that end up in ENet.Service doing almost nothing else than worthless wrap checks.

I implemented solution to that problem in our fork of ENet-CSharp repository - StrangeLoopGames/ENet-CSharp#4 https://github.com/StrangeLoopGames/ENet-CSharp/pull/4. It have too many differences with enet main repo, so I can't just open PR. Also on second thought I decided it will be better to implement it in slightly different way.

How I fixed it:

I added another list of pendingOutgoingCommands for commands which fails window check. If outgoingCommand fails the check it moves to that queue and then we assume all commands in that queue will also fail the check and it is safe to not scan rest of items. For delivery we first check pendingOutgoingCommands until first fail and then continue with main queue. Instead of 32 000 checks we usually can do just one in that case reducing complexity from O(N) to O(1).

But this solution need to check these two queues everywhere.

This is how Enet.Service thread looks before/after optimization for just a single user, for multiple users it much worse: [image: image] https://user-images.githubusercontent.com/475445/216537900-72a0388d-033a-4463-81e9-c751edaebccf.png

How it may be fixed better way:

For user reliable fragments they may go to pendingOutgoingCommands queue first (probably even without reliableSequenceNumber to not mess up with service messages). check_outgoing_commands scans main queue first and when it ends it pulls from pendingOutgoingCommands (and assigns reliableSequenceNumber) until window exceed. If window is full then it won't even try to get anything else. Then you have basically same logic everywhere with just main queue, but only have to process pending queue in check outgoing packets (and for diconnect later of course).

Another optimization may be done for continue sending. Instead of scanning all peers you may have an array of peer indices (which only need to rebuild when peer connect/disconnect). Then you have range of active peers in that array (0..N), when peer continue sending then it may exchange it's index in array with first peer which doesn't continue sending (like quick sort works). You have write index and swap peer index value at write index. After the pass writeIndex will be a number of peers which have payload, so you will need to scan just few of them instead of repeating scan of hundreds of peers.

— Reply to this email directly, view it on GitHub https://github.com/lsalzman/enet/issues/218, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALDVUMUDKM3HLTYUUVN2PTWVS2YBANCNFSM6AAAAAAUP6MEW4 . You are receiving this because you are subscribed to this thread.Message ID: @.***>