quinn-rs / quinn

Async-friendly QUIC implementation in Rust
Apache License 2.0
3.85k stars 394 forks source link

Limit UDP packet size to the MTU (fragmentation) #1572

Closed rom1v closed 1 year ago

rom1v commented 1 year ago

Hi,

First of all, thank you for your library, it works pretty well. However, I encounter a problem with UDP packet length.

I send many QUIC datagrams using Connection.send_datagram() which are no larger than Connection.max_datagram_size() (which is 1414 bytes in practice).

If I randomly drop UDP packets:

iptables -A INPUT -p udp -s 192.168.x.x -m statistic --mode random --probability 0.02 -j DROP

Then I observe that I lose several consecutive datagrams at once, which suggests that they are packed in the same UDP packet.

This is confirmed when I capture packets in Wireshark: some UDP packets are way larger than the MTU (up to 64Kb), so they presumably contain several QUIC datagrams.

Some UDP packets are small (only 1 QUIC datagram) and are correctly decoded by Wireshark:

QUIC_packet_small_0

But some are bigger (and cannot be correctly decoded, I don't know why): QUIC_packet_big_0

I send video packets with forward-error-correction, so I absolutely want to avoid packing several QUIC datagrams into the same UDP packet (this defeats the purpose of FEC). How can I guarantee that?

At least, I would like to limit UDP packets produced by quinn to 1200~1500 bytes. So I upgraded to quinn 0.10.1 which introduces new features related to MTU (ddc7ee16b5ced2cb1b18de40515d021d8bbd981b and next commits).

I disabled MTU discovery: transport_config.mtu_discovery_config(None).

The documentation says:

Defaults to None, which disables MTU discovery altogether.

But the default is actually Some(MtuDiscoveryConfig::default()), so I forced it to None.

Anyway, that does not fix the problem: quinn still generates UDP packets up to 64k.

Note that on connection initialization, I get this error message on the server side:

[2023-05-24T13:19:35Z ERROR quinn_udp::imp] got EIO, halting segmentation offload
[2023-05-24T13:19:35Z WARN  quinn_udp] sendmsg error: Os { code: 5, kind: Uncategorized, message: "Input/output error" }, Transmit: { destination: 192.168.1.74:54129, src_ip: Some(192.168.1.4), enc: Some(Ect0), len: 2162, segment_size: Some(1200) }

How can I avoid sending "big" UDP packets (which are probably fragmented on the network).

Thank you for your help.

djc commented 1 year ago

Looks like we forgot to update the documentation -- sorry about that. If you could send a PR for that, that would be great. To confirm, you're working on Linux? What version of the kernel are you using?

It definitely sounds like a bug that UDP packets end up being larger than the MTU.

Just to be clear, I'm not sure QUIC guarantees that one UDP packet will be sent per QUIC datagram -- the QUIC stack will definitely try to pack multiple QUIC datagrams into a single UDP packet where it fits.

rom1v commented 1 year ago

Thank you for your response.

Looks like we forgot to update the documentation -- sorry about that.

No problem :)

If you could send a PR for that, that would be great.

Sure, but what is the expected fix?

-Defaults to `None`, which disables MTU discovery altogether.
+Defaults to `MtuDiscoveryConfig::default()`.

?

Or actually set the default value to None?

To confirm, you're working on Linux?

Yes, Debian sid.

What version of the kernel are you using?

$ uname -srv
Linux 6.1.0-9-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.27-1 (2023-05-08)

Just to be clear, I'm not sure QUIC guarantees that one UDP packet will be sent per QUIC datagram -- the QUIC stack will definitely try to pack multiple QUIC datagrams into a single UDP packet where it fits.

As far as I understand, while the QUIC protocol allows to pack several frames (like DATAGRAMs) into a single UDP packet, but whether frames are packed or not is up to the sender implementation (a peer could decide to not always pack, according to some configuration). But indeed it seems quinn does not expose such a possibility (I don't know how this could be exposed anyway).

For my use case, sending QUIC datagrams which often "fill" an UDP packet with an UDP packet limited to the MTU should probably be sufficient.

Ralith commented 1 year ago

some UDP packets are way larger than the MTU (up to 64Kb), so they presumably contain several QUIC datagrams.

The default configuration should not permit UDP payloads larger than 1452 bytes. In 0.9, the limit was even smaller.

I suspect what you're seeing is missing support in both Wireshark and the iptables statistic module for generic segmentation offload, a recent kernel feature that Quinn (like most QUIC impls) uses to significantly improve UDP transmit performance. This is consistent with Wireshark's failure to decrypt the packet: it's attempting to treat it as a single UDP payload, but actually it is many separate payloads, which will be transmitted as separate packets on the wire.

You should be able to work around this missing support by running wireshark, and any randomized iptables rules, on the receiving host, which will see individual packets. Receive offload seems less likely to cause confusion.

Sure, but what is the expected fix?

The documentation should reflect the current, intended, behavior, which is that MTU discovery is enabled by default.

rom1v commented 1 year ago

@Ralith Thank you for your response.

You should be able to work around this missing support by running wireshark, and any randomized iptables rules, on the receiving host, which will see individual packets. Receive offload seems less likely to cause confusion.

In this case, both iptables and wireshark were run on the receiving host.

djc commented 1 year ago

Also, there was an error message saying that "halting segmentation offload". So I don't think that's it?

Ralith commented 1 year ago

Ah, interesting. I suppose it's possible GRO could be the issue? That does get set persistently on the socket, and similar to GSO, serves to bypass a lot of kernel machinery that would normally handle packets individually.

What do you see if you observe the traffic from a third host?

djc commented 1 year ago

I feel like it's time to start peeking at the source/looking at some logging. @rom1v do you have a tracing-subscriber set up for your use case? quinn-proto has pretty extensive traces. If you want to do some println!() debugging or sift through the logic to figure out what's what, you'll want to look at Connection::poll_transmit(). Relevant logic about filling up a UDP frame seems to be at https://github.com/quinn-rs/quinn/blob/main/quinn-proto/src/connection/mod.rs#L552.

rom1v commented 1 year ago

Thank you for the pointers, I will try to get more traces to understand what's happening :+1:

Ralith commented 1 year ago

The existing trace-level logs from https://github.com/quinn-rs/quinn/blob/0c6b743f188b8f1d2c38689ecf6f748d393fbb52/quinn-proto/src/connection/mod.rs#L1983-L1989 and https://github.com/quinn-rs/quinn/blob/0c6b743f188b8f1d2c38689ecf6f748d393fbb52/quinn-proto/src/connection/mod.rs#L830 may be of particular interest, and might be easier to capture than a third-party view of the wire traffic.

rom1v commented 1 year ago

I configured a tracing_subscriber, and run an experiment with both peers running on localhost. I collected the quinn traces for both peers and the wireshark capture.

Here is an excerpt for two successive UDP packets (as seen by Wireshark) for all captures:

Sender

2023-05-25T12:34:38.634082Z TRACE send{space=Data pn=10}: quinn_proto::connection::packet_builder: PADDING * 11
2023-05-25T12:34:38.634158Z TRACE send{space=Data pn=11}: quinn_proto::connection::packet_builder: PADDING * 11
2023-05-25T12:34:38.634225Z TRACE send{space=Data pn=12}: quinn_proto::connection::packet_builder: PADDING * 11
2023-05-25T12:34:38.634289Z TRACE send{space=Data pn=13}: quinn_proto::connection::packet_builder: PADDING * 11
2023-05-25T12:34:38.634343Z TRACE send{space=Data pn=14}: quinn_proto::connection::packet_builder: PADDING * 11
2023-05-25T12:34:38.634409Z TRACE send{space=Data pn=15}: quinn_proto::connection::packet_builder: PADDING * 11
2023-05-25T12:34:38.634465Z TRACE send{space=Data pn=16}: quinn_proto::connection::packet_builder: PADDING * 11
2023-05-25T12:34:38.634522Z TRACE send{space=Data pn=17}: quinn_proto::connection::packet_builder: PADDING * 11
2023-05-25T12:34:38.634596Z TRACE quinn_proto::connection: sending 10789 bytes in 9 datagrams
2023-05-25T12:34:38.634689Z TRACE quinn_proto::connection: sending 1189 bytes in 1 datagrams

Receiver

2023-05-25T12:34:38.635551Z TRACE quinn_proto::connection: got Data packet (1200 bytes) from 127.0.0.1:1234 using id e0c68e2d4ca4a7f8
2023-05-25T12:34:38.635648Z TRACE recv{space=Data pn=10}: quinn_proto::connection: got datagram frame len=1160
2023-05-25T12:34:38.635709Z TRACE quinn_proto::connection: got Data packet (1200 bytes) from 127.0.0.1:1234 using id e0c68e2d4ca4a7f8
2023-05-25T12:34:38.635760Z TRACE recv{space=Data pn=11}: quinn_proto::connection: got datagram frame len=1160
2023-05-25T12:34:38.635799Z TRACE quinn_proto::connection: got Data packet (1200 bytes) from 127.0.0.1:1234 using id e0c68e2d4ca4a7f8
2023-05-25T12:34:38.635838Z TRACE recv{space=Data pn=12}: quinn_proto::connection: got datagram frame len=1160
2023-05-25T12:34:38.635876Z TRACE quinn_proto::connection: got Data packet (1200 bytes) from 127.0.0.1:1234 using id e0c68e2d4ca4a7f8
2023-05-25T12:34:38.635936Z TRACE recv{space=Data pn=13}: quinn_proto::connection: got datagram frame len=1160
2023-05-25T12:34:38.635968Z TRACE quinn_proto::connection: got Data packet (1200 bytes) from 127.0.0.1:1234 using id e0c68e2d4ca4a7f8
2023-05-25T12:34:38.635993Z TRACE recv{space=Data pn=14}: quinn_proto::connection: got datagram frame len=1160
2023-05-25T12:34:38.636018Z TRACE quinn_proto::connection: got Data packet (1200 bytes) from 127.0.0.1:1234 using id e0c68e2d4ca4a7f8
2023-05-25T12:34:38.636042Z TRACE recv{space=Data pn=15}: quinn_proto::connection: got datagram frame len=1160
2023-05-25T12:34:38.636065Z TRACE quinn_proto::connection: got Data packet (1200 bytes) from 127.0.0.1:1234 using id e0c68e2d4ca4a7f8
2023-05-25T12:34:38.636088Z TRACE recv{space=Data pn=16}: quinn_proto::connection: got datagram frame len=1160
2023-05-25T12:34:38.636111Z TRACE quinn_proto::connection: got Data packet (1200 bytes) from 127.0.0.1:1234 using id e0c68e2d4ca4a7f8
2023-05-25T12:34:38.636134Z TRACE recv{space=Data pn=17}: quinn_proto::connection: got datagram frame len=1160
2023-05-25T12:34:38.636158Z TRACE quinn_proto::connection: got Data packet (1189 bytes) from 127.0.0.1:1234 using id e0c68e2d4ca4a7f8
2023-05-25T12:34:38.636181Z TRACE recv{space=Data pn=18}: quinn_proto::connection: got datagram frame len=1160
2023-05-25T12:34:38.636204Z TRACE quinn_proto::connection: got Data packet (1189 bytes) from 127.0.0.1:1234 using id e0c68e2d4ca4a7f8
2023-05-25T12:34:38.636229Z TRACE recv{space=Data pn=19}: quinn_proto::connection: got datagram frame len=1160

Wireshark

frame 49

frame_49

Frame 49: 10831 bytes on wire (86648 bits), 10831 bytes captured (86648 bits) on interface lo, id 0
Ethernet II, Src: 00:00:00_00:00:00 (00:00:00:00:00:00), Dst: 00:00:00_00:00:00 (00:00:00:00:00:00)
Internet Protocol Version 4, Src: 127.0.0.1, Dst: 127.0.0.1
    0100 .... = Version: 4
    .... 0101 = Header Length: 20 bytes (5)
    Differentiated Services Field: 0x02 (DSCP: CS0, ECN: ECT(0))
    Total Length: 10817
    Identification: 0x0000 (0)
    010. .... = Flags: 0x2, Don't fragment
    ...0 0000 0000 0000 = Fragment Offset: 0
    Time to Live: 64
    Protocol: UDP (17)
    Header Checksum: 0x12a8 [validation disabled]
    [Header checksum status: Unverified]
    Source Address: 127.0.0.1
    Destination Address: 127.0.0.1
User Datagram Protocol, Src Port: 1234, Dst Port: 48521
    Source Port: 1234
    Destination Port: 48521
    Length: 10797
    Checksum: 0x2841 [unverified]
    [Checksum Status: Unverified]
    [Stream index: 0]
    [Timestamps]
    UDP payload (10789 bytes)
QUIC IETF
    QUIC Connection information
    [Packet Length: 10789]
    QUIC Short Header DCID=e0c68e2d4ca4a7f8
    [Expert Info (Warning/Decryption): Failed to create decryption context: Decryption (checktag) failed: Checksum error]
    Remaining Payload: a4d701337db625dcd7309defa6196041494bd1060727d0217bf386ab3ca5ab409c59fa6e…

frame 50

frame_50

Frame 50: 1231 bytes on wire (9848 bits), 1231 bytes captured (9848 bits) on interface lo, id 0
Ethernet II, Src: 00:00:00_00:00:00 (00:00:00:00:00:00), Dst: 00:00:00_00:00:00 (00:00:00:00:00:00)
Internet Protocol Version 4, Src: 127.0.0.1, Dst: 127.0.0.1
    0100 .... = Version: 4
    .... 0101 = Header Length: 20 bytes (5)
    Differentiated Services Field: 0x02 (DSCP: CS0, ECN: ECT(0))
    Total Length: 1217
    Identification: 0x0000 (0)
    010. .... = Flags: 0x2, Don't fragment
    ...0 0000 0000 0000 = Fragment Offset: 0
    Time to Live: 64
    Protocol: UDP (17)
    Header Checksum: 0x3828 [validation disabled]
    [Header checksum status: Unverified]
    Source Address: 127.0.0.1
    Destination Address: 127.0.0.1
User Datagram Protocol, Src Port: 1234, Dst Port: 48521
    Source Port: 1234
    Destination Port: 48521
    Length: 1197
    Checksum: 0x02c1 [unverified]
    [Checksum Status: Unverified]
    [Stream index: 0]
    [Timestamps]
    UDP payload (1189 bytes)
QUIC IETF
    QUIC Connection information
    [Packet Length: 1189]
    QUIC Short Header DCID=e0c68e2d4ca4a7f8 PKN=19
    DATAGRAM
        Frame Type: DATAGRAM (0x0000000000000031)
        Datagram Length: 1160
        Datagram: eb0455f0a06e60e000000001000000000000001eac000468010001080000000c5667947e…
djc commented 1 year ago

Are you still seeing the message about halting segmentation offload? If not, maybe try disabling segmentation offload?

rom1v commented 1 year ago

Are you still seeing the message about halting segmentation offload?

No.

If not, maybe try disabling segmentation offload?

If I turn off tx-udp-segmentation, then I correctly get separate UDP packets (QUIC length=1200, UDP length=1208) in Wireshark.

$ sudo ethtool -k lo | grep seg
tcp-segmentation-offload: on
    tx-tcp-segmentation: on
    tx-tcp-ecn-segmentation: on
    tx-tcp-mangleid-segmentation: on
    tx-tcp6-segmentation: on
generic-segmentation-offload: on
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-gre-csum-segmentation: off [fixed]
tx-ipxip4-segmentation: off [fixed]
tx-ipxip6-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-udp_tnl-csum-segmentation: off [fixed]
tx-tunnel-remcsum-segmentation: off [fixed]
tx-sctp-segmentation: on
tx-esp-segmentation: off [fixed]
tx-udp-segmentation: on
$ sudo ethtool -K lo tx-udp-segmentation off
$ sudo ethtool -k lo | grep seg
tcp-segmentation-offload: off
    tx-tcp-segmentation: off
    tx-tcp-ecn-segmentation: off
    tx-tcp-mangleid-segmentation: off
    tx-tcp6-segmentation: off
generic-segmentation-offload: on
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-gre-csum-segmentation: off [fixed]
tx-ipxip4-segmentation: off [fixed]
tx-ipxip6-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-udp_tnl-csum-segmentation: off [fixed]
tx-tunnel-remcsum-segmentation: off [fixed]
tx-sctp-segmentation: on
tx-esp-segmentation: off [fixed]
tx-udp-segmentation: off
djc commented 1 year ago

It seems like a smoking gun that Wireshark reports that the UDP packet in frame 49 has Length: 10797. At least I would assume that the kernel is not allowed to just squash UDP packets together. At the same time, Quinn believes it sent 9 (UDP) datagrams in 10789 bytes. I presume this is using the quinn-udp socket implementations? I'd probably start by injecting some extra tracing there to see what's going on.

In particular, in https://github.com/quinn-rs/quinn/blob/main/quinn-udp/src/unix.rs#L158 I assume &[Transmit] has len() 9, but what is BATCH_SIZE? How many times does sendmmsg_with_fallback() get called?

rom1v commented 1 year ago

In particular, in https://github.com/quinn-rs/quinn/blob/main/quinn-udp/src/unix.rs#L158 I assume &[Transmit] has len() 9, but what is BATCH_SIZE? How many times does sendmmsg_with_fallback() get called?

2023-05-25T13:45:54.103252Z TRACE send{space=Data pn=9}: quinn_proto::connection::packet_builder: PADDING * 11
2023-05-25T13:45:54.103283Z TRACE send{space=Data pn=10}: quinn_proto::connection::packet_builder: PADDING * 11
2023-05-25T13:45:54.103304Z TRACE send{space=Data pn=11}: quinn_proto::connection::packet_builder: PADDING * 11
2023-05-25T13:45:54.103325Z TRACE send{space=Data pn=12}: quinn_proto::connection::packet_builder: PADDING * 11
2023-05-25T13:45:54.103346Z TRACE send{space=Data pn=13}: quinn_proto::connection::packet_builder: PADDING * 11
2023-05-25T13:45:54.103366Z TRACE send{space=Data pn=14}: quinn_proto::connection::packet_builder: PADDING * 11
2023-05-25T13:45:54.103388Z TRACE send{space=Data pn=15}: quinn_proto::connection::packet_builder: PADDING * 11
2023-05-25T13:45:54.103408Z TRACE send{space=Data pn=16}: quinn_proto::connection::packet_builder: PADDING * 11
2023-05-25T13:45:54.103432Z TRACE quinn_proto::connection: sending 10789 bytes in 9 datagrams
2023-05-25T13:45:54.103465Z TRACE quinn_proto::connection: sending 1189 bytes in 1 datagrams
==== transmits.len()=2, BATCH_SIZE=32
==== sendmmsg_with_fallback() num_transmits=2
djc commented 1 year ago

Okay, can you check what happens between sending 10789 bytes in 9 datagrams and transmits.len()=2?

Ralith commented 1 year ago

Here is an excerpt for two successive UDP packets (as seen by Wireshark) for all captures:

This is consistent with Wireshark failing to handle GRO correctly. You could verify by adding a log to the Datagram case of Connection::handle_event (which runs at most once per UDP datagram), but I think it will tell you the same thing: Quinn is successfully receiving 9 separate UDP datagrams.

I'd recommend opening an issue upstream with Wireshark. It should be easy to demonstrate the issue with a small C program that exercises UDP GRO.

rom1v commented 1 year ago

I think it will tell you the same thing: Quinn is successfully receiving 9 separate UDP datagrams.

Indeed.

I have written a sample: udp-segmentation.

It shows that if I send 10000 bytes in a single sendmsg() call with segment_size=1200, then the receiver correctly receives 9 packets (8×1200+400), but wireshark only sees a single UDP packet.

And a priori, iptables has the same "problem" as wireshark (I don't know if it is expected or not), since it drops whole UDP "big-packets" (see initial post).

Ralith commented 1 year ago

I don't know if it is expected or not

I suspect the iptables behavior is intended. The purpose of GSO/GRO is to reduce per-datagram work, and iptables rule evaluation is one such task. The overwhelmingly common case is that iptables will handle packets with the same source/destination addresses uniformly.

Wireshark, however, absolutely should be handling this somehow. I'm closing this since there doesn't seem to be any action for Quinn, but please post if you open an issue with them!

rom1v commented 1 year ago

I just opened an issue for Wireshark: https://gitlab.com/wireshark/wireshark/-/issues/19109

johnthacker commented 1 year ago

I would assume that the kernel is not allowed to just squash UDP packets together

It absolutely is, particularly if the segmentation is done in hardware. What exactly do you think GSO does?

QUIC in particular has very precise rules about coalescing UDP datagrams.

It's true that a capture on the receiving side, or in a network tap would look different. But the large packet squashed together is what the kernel provides to the capture interface on the sending side.