Closed stormshield-guillaumed closed 1 year ago
From an API design perspective, I see a few alternatives:
initial_max_udp_payload_size
: if you get it wrong, you can cause an unrecoverable black hole. This means that we are free to use initial_max_udp_payload_size
as a reset value when a black hole is detected.initial_max_udp_payload_size
and introduce a minimum_max_udp_payload_size
(or something with a better name), meant to be used as MTU when a black hole is detected. Obviously, initial_max_udp_payload_size
should be >= minimum_max_udp_payload_size
.I feel inclined towards 2, because IMO it separates concerns better (e.g. I can imagine someone might want to start with a high initial MTU and fall back to a lower one in case of a black hole).
Avoiding unrecoverable black holes definitely sounds preferable to having them. I'm still not completely sure on what the shortcoming is here. Why does resetting the MTU to 1200 cause throughput to go to 0 without discovery increasing it back to reasonable levels?
Cause our application does not pass the error SendDatagramError::TooLarge
back to the TUN and thus the operating system PMTUD cannot lower the TUN MTU by itself.
Maybe later we will do it but it is not supported for now.
So basically you want us to implement a workaround for missing functionality in your TUN layer? I'm not entirely sure this is reasonable.
Indeed, our use case is particular, but this question is still valid : if the user set an initial_max_udp_payload_size explicitly, is it reasonable for the black hole detector to reset the MTU to a lower value than this one ?
Given Quinn's big user base, there is a chance other users will want to have an escape hatch now we have introduced black hole detection (e.g. similar to how the default in rustls is to enforce validating certificates, yet you can go out of your way to disable that using the dangerous_configuration
feature, knowing you might cause security problems by doing so).
In an analogous case, someone in an "I know what I'm doing" scenario might want to tell Quinn that there is a guaranteed MTU that can be relied upon, even when the black hole detector thinks otherwise (this used to exist in the form of initial_max_udp_payload_size
but was effectively removed by my changes).
without discovery increasing it back to reasonable levels?
It is because the MTU discovery is disabled, so, when the black hole is detected, the MTU is reset to 1200 and never changes again.
In an analogous case, someone in an "I know what I'm doing" scenario might want to tell Quinn that there is a guaranteed MTU that can be relied upon, even when the black hole detector thinks otherwise (this used to exist in the form of
initial_max_udp_payload_size
but was effectively removed by my changes).
Okay, I guess that's a more reasonable use case.
Indeed, our use case is particular, but this question is still valid : if the user set an initial_max_udp_payload_size explicitly, is it reasonable for the black hole detector to reset the MTU to a lower value than this one ?
I mean, that's how we defined the inital_max_udp_payload_size
right now -- as an initial starting point, not as a minimum. Falling back to 1200 is the safest option.
Why does resetting the MTU to 1200 cause throughput to go to 0 without discovery increasing it back to reasonable levels?
Still thinking about this... Maybe I'm being dense but I think the main effect of Quinn having a smaller defined MTU for the connection is that it submits smaller packets, right, or is there something else that it does? And if this is the only effect, why does bandwidth go to 0? Submitting 1200-sized packets over a 1412-MTU tunnel should be fine?
Still thinking about this... Maybe I'm being dense but I think the main effect of Quinn having a smaller defined MTU for the connection is that it submits smaller packets, right, or is there something else that it does? And if this is the only effect, why does bandwidth go to 0? Submitting 1200-sized packets over a 1412-MTU tunnel should be fine?
I was also confused by this at first, because I've not worked with TUN interfaces in the past. This is my current understanding: the TUN's MTU is the MTU supported by the OS networking stack, whereas Quinn is used as the underlying layer, which will transport the packets passed to the TUN interface. In that case, the TUN interface will pass 1412-sized packets to Quinn, and they will never get through.
@stormshield-damiend Is the diagram below accurate? (I guess the arrows should go both ways, but the point is clear)
So you're implementing something where a TUN packet must result in exactly one UDP packet generated by Quinn? This also not really something Quinn is designed for...
One tun packet is send as one Quic Datagram.
Okay, I guess it might make sense to add a TransportConfig::min_udp_payload_size
.
I concur. Setting a hard lower bound for the search is gracefully symmetrical to the upper bound we already expose. Maybe place it alongside that in MtuDiscoveryConfig
?
Fixed by #1529
We (Stormshield) found an issue with the black hole detection. First, some context :
The bug was found during an
iperf3
bench between two peers. After an unknown amount of time, the upload traffic reaches 0Mbits/s and never goes up again. After some digging, we found this code, which reset the MTU to 1200 when a black hole is detected, whether PLPMTUD is enabled or not.Using #1529 fixes the issue because no black hole is detected during the
iperf3
bench. We think an issue remains because, if a black hole is detected in another situation, the MTU is still reset. See the line 148 ofmtud.rs
of #1529self.current_mtu = BASE_UDP_PAYLOAD_SIZE;
.It would be nice to control the reset value. If you need any additional information, you can ask.