willardf / Hazel-Networking

Hazel Networking is a low level networking library for C# providing connection-oriented, message-based communication via RUDP.
MIT License
385 stars 60 forks source link

Intercept app data for DTLS earlier #39

Closed willardf closed 2 years ago

willardf commented 2 years ago

We were intercepting bytes at WriteBytesToConnection, which is too late to avoid starting resend timers for reliable messages. Also I'm wondering if the messages might accidentally be encrypted wrong? Like using the previous cypher if we have to renegotiate in the middle of a connection.

This is a draft because there is one failing test that needs to be fixed, but I wanted to get some feedback on what's happening and the logic of said failing test.

TestMalformedConnectionData is the only failing test because the DTLS was relying on the UDP hello packet timing out to fail the Connect. While it's important that after Connect times out, we need to stop sending packets and set the connection to NotConnecting (failure to connect). It seems... double bad? that we were relying on that kind of failure when the DTLS handshake itself has nothing to do with the underlying hello packets.

In short, I'm going to continue to poke around for a good way to have DTLS bubble up a failure to connect, but I wanted to flag this as kind of a weird thing. It's very elegant that DTLS sits on top of the UDP layer and doesn't care about the underlying send or retry mechanisms, but it also means that it needs to recreate some of those mechanisms, which it might not be doing right now.

Edit + Ready for Review: Okay, so adding timeout checks on the handshake resends seems pretty adequate. The one test is fixed, but I'm mostly left with the feeling that the testing is inadequate, but things are okay other than my anxiety.

willardf commented 2 years ago

As I'm trying to figure out where to insert a timeout, I'm realizing that it's unlikely a connection can properly disconnect due to epoch negotiation ever? So figuring this out might fix a bug with established connections as well. Still trying to sort that bit out, though.

willardf commented 2 years ago

Okay, I have a fix that seems valid, but I'm still not sure if the epoch can actually ever be renegotiated in the middle of a connection. It's a good thing if no because it means fewer cornercases and I suspect there's no really good reason to. It's just a hole in my understanding. Edit: As far as I can see, any handshake packets the server sends down that we aren't expecting are dropped and once we go into "Established" state, we can't come out. So I'm pretty sure I imagined renegotiation. That isn't a thing.

Mukikaizoku commented 2 years ago

TestMalformedConnectionData is the only failing test because the DTLS was relying on the UDP hello packet timing out to fail the Connect. While it's important that after Connect times out, we need to stop sending packets and set the connection to NotConnecting (failure to connect). It seems... double bad? that we were relying on that kind of failure when the DTLS handshake itself has nothing to do with the underlying hello packets.

Yeah, going through the code today, had noticed that the DTLS layer was relying on the SendHello timeout to timeout an infinite DTLS connection attempt. Agreed that the logic of the layers shouldn't be intertwined---so nice that this PR fixes that here.

In short, I'm going to continue to poke around for a good way to have DTLS bubble up a failure to connect, but I wanted to flag this as kind of a weird thing. It's very elegant that DTLS sits on top of the UDP layer and doesn't care about the underlying send or retry mechanisms, but it also means that it needs to recreate some of those mechanisms, which it might not be doing right now.

The decorator setup for the DTLS layer is indeed elegant here and nicely isolates the concerns. While not 100% sure it was implied, I do agree that having to recreate the same types of logic (timeouts, reliability/resends) in this layer seems not ideal?

I think the changes at hand are great, but for food for thought, an alternative approach may be to insert DTLS in the "middle":

[Application calls Connect/Send] -> [Connection layer] -> [DTLS layer] -> [Reliability layer] -> [Send layer]

Here, DTLS can send out packets and they are handled like any reliable packets. The Reliability layer would detect timeouts and bubble that back to whatever called it (DTLS or Connection , and DTLS will clean its state and bubble back to Connection layer)---and ultimately Connection layer handles the disconnect report.

The equivalent arrows drawn for the current setup, with the inheritance hierarchy, would likely weave in and out of each of the layers, and I think that's why it is difficult to reuse the reliability and timeout logic.

willardf commented 2 years ago

We talked offline about how the organization of Hazel is very heavily inherited and how that makes it difficult to follow the data and test code units. It is possible in the future Hazel will undergo a large refactor to componentize sections like DTLS or Reliability for these reasons, but it'll be a large undertaking that will be hard to prioritize.

willardf commented 2 years ago

So one of the tests was consistently failing, but I couldn't figure out if it was UDP or DTLS, and when I split things to figure it out, the test started passing. Never repro'd locally, so I guess I'm good to go?

Mukikaizoku commented 2 years ago

Hahaha yeah that is strange---it wasn't obvious from the code either. And yeah, I think you're good to go!