willardf / Hazel-Networking

Hazel Networking is a low level networking library for C# providing connection-oriented, message-based communication via RUDP.
MIT License
385 stars 60 forks source link

Fix deadlock caused by DTLS-multithreading fix #30

Closed Mukikaizoku closed 2 years ago

Mukikaizoku commented 2 years ago

A previous PR (https://github.com/willardf/Hazel-Networking/pull/29) introduced a scenario where, while a thread had locked a peer object, the message processing would leave the DTLS layer and eventually reenter the DTLS layer, attempting to lock other peer objects for sending messages. With multiple threads doing the same thing a deadlock occurs pretty readily.

The fix ensures that if a receive message requires processing after the DTLS layer (ie, application data), it will leave the peer lock before doing so.

willardf commented 2 years ago

I was bored yesterday and found an exception while trying to stress test this. So I fixed that and improved the allocations like I mentioned.

Mukikaizoku commented 2 years ago

Curious, what was the exception that you found?

For the update---how about we switch to a ConcurrentQueue? A ConcurrentBag doesn't have any ordering guarantees and while it doesn't seem this layer is concerned about message ordering, I'd be concerned there would be rare edge cases where an added message gets pushed behind many other messages

willardf commented 2 years ago

I'm okay with ConcurrentQueue, but you can't guarantee any ordering even with the queue. If the client sends three packets, the peer lock will stop each one and multiple records will be sequential, but the three packets will still race. If you removed the peer lock, then the race shifts to the queue add race the same way and then even multiple records can be interleaved. For this reason, the Bag is higher performance because it prefers thread local adds/takes. We're mostly assuming that all writers will also become readers for this queue, so it should almost always be in a drained state.

Mukikaizoku commented 2 years ago

Yup--my suggestion for the queue isn't for perfect ordering, rather imagining a message getting stuck in the bag for some significant amount of time. I believe this could happen if there are times the bag is behaving closer to LIFO.

If our server is performant, the rate of processing should be higher than that of an incoming message rate for a peer, so the bag should always clear under good conditions, so my concern might not be a problem