Closed aochagavia closed 1 year ago
I am tracking ACKs for MTU probes manually (e.g. without using PackageBuilder::finalize_and_track), because otherwise things like automatic retransmission kick in.
Per RFC 9000 § Retransmission of Information, QUIC retransmission operates at frame granularity, not packet granularity. In particular note:
PING and PADDING frames contain no information, so lost PING or PADDING frames do not require repair.
So, there should be no need to bypass the normal loss detection machinery. However, note that there is a subtle interaction with congestion control:
Loss of a QUIC packet that is carried in a PMTU probe is therefore not a reliable indication of congestion and SHOULD NOT trigger a congestion control reaction; see Item 7 in Section 3 of [DPLPMTUD]. However, PMTU probes consume congestion window, which could delay subsequent transmission by an application.
@Ralith thanks for the feedback! I think I finally figured out how to use finalize_and_track
:). The current status of the PR is that it replicates the original changes by @djc, plus black hole detection and tests (check out, for instance, blackhole_after_mtu_change_repairs_itself
). Any suggestions so far?
Next week I plan to hunt down the test failure (unfortunately, the test runner doesn't give info about which test actually failed, other than thread panicked while panicking. aborting.
)
unfortunately, the test runner doesn't give info about which test actually failed
Hmm, we should probably set RUST_BACKTRACE=1 in the test environment.
Thanks for the suggestions! Next week I'll have time to go through them ;)
By the way, we'll probably want to tweak a few things (please let me know if you think differently):
Also interesting: how do you think this change should affect versioning? Once this is merged, endpoints will be doing a few extra requests than usual, which might be unexpected for the crate's users. Related to this, do you think it is worthwhile to provide a way to disable PLPMTUD?
Also interesting: how do you think this change should affect versioning? Once this is merged, endpoints will be doing a few extra requests than usual, which might be unexpected for the crate's users. Related to this, do you think it is worthwhile to provide a way to disable PLPMTUD?
I don't think this is reason to do a semver-incompatible version bump, if that's what you are referring to, as long as we don't remove any API that people might currently be referring to. That said, I think we'll want to release 0.10 in the not-too-distant future (see #1515), so it might make sense to release this at the same time.
do you think it is worthwhile to provide a way to disable PLPMTUD
I think it's fine to silently enable it by default, but I can imagine use cases where it might be unnecessary (known networks) or harmful (endpoints which want to sleep for very long periods). A good approach might be to make the interval configurable through an Option<Duration>
or similar.
Today I addressed @Ralith's feedback, found the test that was crashing cargo test
and implemented binary search.
Here are some interesting bits and a few questions:
TransportConfig::initial_max_udp_payload_size
as initial lower bound. The initial upper bound is the minimum of the local EndpointConfig::max_udp_payload_size
and the peer's max_udp_payload_size
. I have a question regarding this: technically, the local EndpointConfig::max_udp_payload_size
is about which sizes we are willing to receive, and not about the sizes we are willing to send, so I'm not totally sure it makes sense to reuse it here (i.e. maybe we should have a separate way of configuring the upper bound for our binary search). Do you have an opinion on this?tests::cid_retirement
is failing after my changes. I don't see how that is related to PLPMTUD, so if you have a hunch I'm all ears. Tomorrow I plan to get a diff between the traces with the feature enabled / disabled, to investigate further. Hopefully that will get me closer to finding the cause of the failure.None
for cases where MTUD should run only once and Some(_)
for cases in which a regular interval is desired.
None
for cases where MTUD should run only once
Why once rather than never?
None
for cases where MTUD should run only onceWhy once rather than never?
No strong preference here, I just thought you meant once. I'll go with never when the interval is None
, then ;)
Just pushed an updated (and rebased) version of the PR, with updated docs, passing tests and all the features we discussed above.
There are still a few things I want to look into (tomorrow), before marking this PR as "ready for review" (instead of it being a draft):
MtuDiscovery
has taken. I'll see if I can refactor it a bit to make it more readable.mtud.current_mtu()
doesn't mind that the returned value of the function is allowed (and expected!) to change.Also, there is this comment in the old PR, mentioning that we should only enable PLPMTUD on platforms where we are setting the DF IP bit on. Is that still relevant @Ralith?
Finally, I added a commit to make Pair::drive
more robust (otherwise it would consider a connection idle even though there were packets waiting to be sent, as I discovered after thoroughly debugging the cid_retirement
test).
technically, the local EndpointConfig::max_udp_payload_size is about which sizes we are willing to receive, and not about the sizes we are willing to send, so I'm not totally sure it makes sense to reuse it here (i.e. maybe we should have a separate way of configuring the upper bound for our binary search). Do you have an opinion on this?
Do we strictly need an upper bound? We could do open-ended "binary search" by growing the probe size exponentially until it starts failing. That'd be cool for Just Working on localhost, LANs with arbitrarily large jumbo frames, and so forth. Though, we probably should take care not to overly sacrifice rate of convergence in a typical environment for these niche cases. I'm not strongly opposed to a dedicated knob, but it's nice to have universal logic when possible. We might want to tune it carefully to test typical internet limits first regardless.
As mentioned on Matrix, the docs of initial_max_udp_payload_size mention that "effective max UDP payload size (...) will never fall below [the provided] value". I think we actually should fall below the provided value in the case of blackholes (that is what the current implementation does). Or am I missing something?
Capturing our discussion in chat: I'm happy to turn this into an initial guess rather than a hard floor.
Also, there is https://github.com/quinn-rs/quinn/pull/804#issuecomment-1172950107 in the old PR, mentioning that we should only enable PLPMTUD on platforms where we are setting the DF IP bit on. Is that still relevant @Ralith?
Yes; I believe we currently only set that bit on Linux and Windows.
Do we strictly need an upper bound? We could do open-ended "binary search" by growing the probe size exponentially until it starts failing. (...). We might want to tune it carefully to test typical internet limits first regardless.
The upper bound is not strictly needed, but since the peer and the local endpoint have a configured max_udp_payload_size
transport parameter, I think we can take that as a hint. To give you an idea of the performance difference in a typical internet path (MTU = 1500), here are the amount of probes sent before converging depending on the upper bound used:
Note that each failed attempt at a particular MTU results in 3 probes being sent. After the third lost probe, we conclude that the probed MTU is not supported by the network path.
I took the time to experiment using exponential search to find the upper bound before doing binary search. Here are the results:
This leads me to think we should keep the current approach, and require users to configure max_udp_payload_size
in conjunction with init_udp_payload_size
if they are aiming at >1500 MTUs.
Nice analysis! I concur that prioritizing convergence for realistic cases should take precedence.
the peer and the local endpoint have a configured max_udp_payload_size transport parameter, I think we can take that as a hint
Hmm, the RFC says:
It is expected that this is the space an endpoint dedicates to holding incoming packets.
and specifies the default value as 65527, so this may not be a good hint in practice, and our current default of 1480 might be inappropriate. For the sake of fast convergence I'd be happy to have a dedicated transport config field for MTUD upper bound, and default it to the appropriate value for a typical internet path. Future work could perhaps look at occasional exponential probes above this if the upper bound is reached.
Thanks for the feedback. I ended up introducing a PlpmtudConfig
struct that contains the necessary config for PLPMTUD.
Here's another update and some additional questions:
quinn-udp
and I can't fully conclude from the code which platforms have the DF bit set. I'll assume that @Ralith's comment above is correct and make sure PLPMTUD is only allowed in Linux and Windows (it surprises me, though, that we are unable to do it in other mature operating systems).
- Regarding black hole detection, I was unable to ask about it on the QUIC WG Slack because it is invite-only (I sent an email to the mailing list to request an invitation, but haven't heard anything yet).
When did you request an invite?
- Regarding the IP DF bit, I had a look at
quinn-udp
and I can't fully conclude from the code which platforms have the DF bit set. I'll assume that @Ralith's comment above is correct and make sure PLPMTUD is only allowed in Linux and Windows (it surprises me, though, that we are unable to do it in other mature operating systems).
Looks like it's IP_DONTFRAG
/IPV6_DONTFRAG
for Windows and IP_PMTUDISC_PROBE
for Linux. https://stackoverflow.com/questions/4415725/ip-dont-fragment-bit-on-mac-os has some suggestions for macOS (that I suppose we don't implement yet).
- Finally, are there any other things I should look at in order to bring this PR into shape, so it can land?
If you think it's reasonably close, mark it as ready and I'll do a full review pass.
When did you request an invite?
Yesterday, 10:00 CET (so quite recently). I am not sure my email went through, though, as I didn't receive a confirmation or anything of the sort (I wrote to quic-chairs@ietf.org, as recommended on https://quicwg.org)
If you think it's reasonably close, mark it as ready and I'll do a full review pass.
Thanks! I'll probably do that next week (I want to look into the DF bit stuff first, and maybe refactor the MtuDiscovery
struct a bit)
I'll try to give the code here a closer look this weekend as well.
I'm inclined to follow s2n-quic's approach here
Seems reasonable. We can run some tests to see if this tends to collapse under congestion in practice.
Regarding the interaction between PLPMTUD and congestion control... The RFCs I am following seem to contradict each other
This is odd. I'm not sure it's actually a contradiction, because AFAICT we only ever care about the initial window when we first initialize a path, before MTUD has had any opportunity to run on it. However, that begs the question: what is the purpose/meaning of recalculating the initial window during the connection? Maybe another good question for the slack.
Yesterday, 10:00 CET (so quite recently).
Lars responded to me in <24h (there was no automated email); if nobody's responded by tomorrow we can bug people in the chat directly.
@Ralith thanks for the review. I just finished addressing all your comments, and left a few questions / remarks here and there.
While looking at the pacer and its interaction with the congestion controller, I discovered that each congestion controller has a max_datagram_size
field, so I assume that will need to be updated based on the detected MTU. I'll look into that tomorrow.
So today I've been looking into updating the congestion controller's max_datagram_size
when we have discovered a new MTU. Unfortunately, the current API design seems to go against it. We currently offer a direct way to set max_datagram_size
, independently of the path's MTU, and it would be pretty unexpected if we would now link both together. IMO if we want the congestion controllers to use the path's MTU as their max_datagram_size
, we would need to rethink the API. Basically, we would need to remove the max_datagram_size
function from the controller's config struct (which is a breaking change), and start using the current MTU instead. This might also mean that we need to automatically derive the values of other config fields, like initial_window
and minimum_window
, because they depend on the value of max_datagram_size
. Overall it feels like a pretty invasive change, and since we are already allowing max_datagram_size
and the current MTU to diverge, I'd rather postpone it.
I also looked into enabling PLPMTUD only on platforms that set the IP DF bit. The technical side is straightforward, but the API design side less so. Assuming we want to keep quinn-proto
free of platform-specific code, we can either:
TransportConfig::default
enable PLPMTUD by default. Direct users of quinn-proto
would need to make sure they are properly setting the DF bit on their IO implementation, or otherwise disable PLPMTUD through TransportConfig::plpmtud_config(None)
. On our side, we would make quinn::Endpoint
ignore the PLPMTUD configuration on unsupported platforms. Unfortunately, this requires a backwards-incompatible change (as far as I can see), because quinn::Endpoint
has read-only access to the TransportConfig
(through Arc<TransportConfig>
) and we would need to mutate it.TransportConfig::default
disable PLPMTUD by default. The function to enable it should document the fact that PLPMTUD is platform-specific and that you should enable it at your own risk. In a later PR we could investigate the possibility of making quinn::Endpoint
raise an error if PLPMTUD is enabled on unsupported platforms (it would require additions to quinn-proto
's public API, so I'd rather postpone this part).PlatformConfig
type in quinn-proto
that indicates platform support for particular features, such as setting the DF bit on outgoing UDP packets. We could have a PlatformConfig::default_transport_config(&self) -> TransportConfig
method that generates a suitable default TransportConfig
, with PLPMTUD enabled if setting the DF bit is supported. The PlatformConfig
instance would be created by users of the quinn-proto
library.All in all I think option 2 is the most viable, so the current implementation disables PLPMTUD by default.
With all this said, @djc I think the PR is ready for a thorough review, so I have marked it as ready.
Basically, we would need to remove the max_datagram_size function from the controller's config struct (which is a breaking change), and start using the current MTU instead.
That sounds correct to me. This breakage is fine; we already have breakage pending/planned, and most users probably aren't using this interface. That said, happy to leave this for future work.
All in all I think option 2 is the most viable, so the current implementation disables PLPMTUD by default.
Since quinn-proto
is independent form a specific I/O backend, this makes sense to me. A similar issue exists for segmentation offload configuration; future iterations on this should probably address both. I like the idea of a PlatformConfig
struct to make this easier for users to get right, but it's fine to leave that for future work too. In the mean time, we should at least update the main examples and performance test/benchmarking tools to enable MTU on linux/windows.
Just rebased on top of latest main and pushed a commit modifying the benchmarks, perf, and examples. I think there isn't anything left to be done on my side. @djc could you have a look when you have the time?
The next things I plan to pick up in follow-up PRs:
max_udp_payload_size
used when creating an Endpoint
(from 1480
to 65527
), as suggested in RFC 9000 and by @Ralith in this comment.TransportConfig
, maybe in the form of a PlatformConfig
struct as described in this comment. Start using it in our examples and benchmarks.@djc I just pushed an amended (and rebased) version of the commits that address all your suggestions
@djc just pushed changes addressing your comments
This PR resurrects @djc's old https://github.com/quinn-rs/quinn/pull/804. As of yet, it is just a draft. IMO it makes sense to look at the commits separately (we might want to squash them in the end, though).
Below are some bits of feedback from the original PR that I'm planning to include before considering this PR as ready for review (feel free to suggest more):
MtuDiscovery::max_udp_payload_size
, to better convey that onlyMtuDiscovery
itself is meant to modify itLooking at the original PR, there is one question that comes up: I am tracking ACKs for MTU probes manually (e.g. without usingI messed up and finally figured out how to get things working :)PackageBuilder::finalize_and_track
), because otherwise things like automatic retransmission kick in. That seems different from what @djc did in the original PR (he seems to get ACK and packet loss detection automatically, without additional code). Am I working in the right direction, or did I miss a function I could use to get ACKs + lost packets for free like in the original PR?Closes #69