p-quic / pquic

The PQUIC implementation, a framework that enables QUIC clients and servers to dynamically exchange protocol plugins that extend the protocol on a per-connection basis
https://pquic.org
MIT License
126 stars 20 forks source link

Interop with aioquic decryption issue #18

Open The3ternum opened 3 years ago

The3ternum commented 3 years ago

This issue was mentioned previously on the pquic-channel in the quicdev slack

I also created an issue on the aioquic github page I seem to have stumbled on a small decryption issue that seems to exist on the regular implementations of aioquic and pquic, and therefore also exists on other implementations based on aioquic and pquic. I will explain my observations across multiple tests here.

I have the following cases: Case 1: A local aioquic client that tries to download 999999 bytes from a local PQUIC server (Default aioquic implementation, the PQUIC server has enabled the QLOG plugin)

Case 2: An online interop result from interop.seemann.io between an aioquic client and a PQUIC server (Test: transfer, flag: DC)

Case 3: My modified local aioquic client that tries to download 999999 bytes from a local pquic server (Modified aioquic implementation that enables multipath, the PQUIC server has enabled the QLOG plugin, as well as the multipath_rr_cond and multipath_qlog plugins)

In all three cases both endpoints seem to have some issues with decrypting certain packets that are being transmitted while the handshake is being performed.

In Case 1 I generated a QLOG-file on both endpoints (Ac_large.qlog and Ps_large.qlog), and I also stored the console output of the PQUIC server (Ps_large.txt) All files are stored in logs.zip. By loading both QLOG files in qvis.edm.uhasselt.be and selecting the actual traces for both the client and the server (not the GENERATED ones, see the first screenshot), we get a visualisation of the issue, as seen in the second screenshot.

traces trace decrypt error

After receiving an initial handshake packet from the PQUIC server, the aioquic client replies with an initial packet (pn: 1) and a handshake packet (pn: 2). The PQUIC server receives the first packet, fails to decrypt it, giving a "header or decryption error: 408." message and finally dropping the packet. The second handshake packet isn't mentioned in the console output. In response the PQUIC server replies with another handshake packet (pn: 1), and two 1RTT packets (pn: 0 and 1). The aioquic client receives these packets and can decrypt the handshake packet and the first 1RTT packet, but not the second 1RTT packet, logging a "packet_decrypt_error". The aioquic client thus acknowledges the handshake packet and the first 1RTT packet, but not the second 1RTT packet. Later on, the PQUIC server identifies this second 1RTT packet as lost, as seen in the third screenshot. However, in this case the connection is able to continue, and the 999999 bytes are transferred, resulting in a clean connection termination.

trace packet lost

By inspecting the client's QLOG and the server's LOG from Case 2 I assume that the same problem occurs, because the client QLOG contains a "packet_decrypt_error", and the server log contains a "header or decryption error: 408.". The connection is also able to continue and terminate as in Case 1.

However, in Case 3 this problem also occurs, but when the packet is identified as lost by the pquic server, the connection is unable to continue. The PQUIC server starts sending 1RTT packets that each contain a ping and padding frame, which are all acknowledged by my aioquic client. These last two steps are repeated in a ping-ack loop as seen in the fourth screenshot, eventually leading to a very long timeout and a connection termination.

trace ping-ack loop

My questions: -Is there an issue with the multipath_rr_cond plugin that prevents the server from sending stream data when a 1RTT packet is identified as lost? Resulting in the ping-ack loop? -What is causing the decryption problem?

(Small edit note: when generating your own traces, PQUIC saves its packets in the QLOG file with {"packet_type": "onertt"}, whereas aioquic saves the packets with {"packet_type": "1rtt"}, search and replace the PQUIC entries to "1rtt" so that both QLOG files have the same notation, otherwise the visualisation will show almost all packets as "lost".)

The3ternum commented 3 years ago

Additional observation: It seems that the undecryptable 1rtt packet that is being sent from the PQUIC server actually has a length that is greater than 1500 bytes (1523 bytes to be exact, excluding the header), thus the packet is rejected by the decryptor on the aioquic client.

mpiraux commented 3 years ago

I am working on a fix for this issue. Meanwhile a small note regarding the "onertt" vs "1rtt" packet type logging: it seems that it's a bug on aioquic's side, the spec allows "onertt" and "1RTT" as values for identifiying these packets, but not "1rtt". https://quiclog.github.io/internet-drafts/draft02/draft-marx-qlog-event-definitions-quic-h3.html#name-packettype

rmarx commented 3 years ago

Hey @mpiraux, small note on that: qlog actually only allows "1RTT". The onertt thing is because enums in most programming languages cannot start with a numerical character and we modeled the qlog schema after TypeScript, which has that same problem. Arguably that kind of implementation detail should not be in the spec and it has given others some confusion as well, so we'll probably remove it down the line, keeping only 1RTT as an entry.

In qvis specifically then, we're more lenient. It doesn't really look for the specific enum values, just matches identical packet_type string values together when trying to correlate two traces. So you can use "onertt" if you're just doing pquic traces, it just breaks when using aioquic's "1rtt" or another's "1RTT". Finally, we do a pass to transform to uppercase "1rtt" into "1RTT", originally to help with draft-00 to draft-01 transition, so aioquic seems interoperable with other qlogs, even though it is indeed also not 100% correct according to spec.

Hope that helps :)

mpiraux commented 3 years ago

I just pushed a big update for Multipath, which basically uses and converts initial paths into uniflows instead of opening new (redundant) uniflows. In our testing this has simplified things quite a bit and fixed some bugs. Yours could very be one of them, as well as the other ones. Could you try the latest commit ? I will update the live endpoint in a few minutes

The3ternum commented 3 years ago

Hi, thank you for the fix, it seems that the three packets from the first screenshot (initial 1 and handshake 2 from the aioquic client and 1RTT 1 from the pquic server) still are not decrypted correctly by either endpoint.

I have also included the QLOG-files from both the aioquic server and the pquic client, as well ass the console log from the pquic-client. See Logs.zip

The3ternum commented 3 years ago

One of my questions is what the behaviour of the additional sending uniflows is in the PQUIC endpoint (more specific, the client), as it seems like it is trying to send data over a new path?

I believe that it is also worth mentioning that I have included an address filter for the pquic endpoint. I am not sure if it is correct, but the code should currently only allow "127.0.0.1" to be used. (It seems to work, as the PQUIC endpoint no longer advertises additional addresses.) If incorrect, what code should I be using instead? And how do i include the IPv6 variant "::1"? See Only_local.zip for the code that i used.

mpiraux commented 3 years ago

However, the packet that carries these frames never arrives on the aioquic server (there are no errors, the packet isn't even logged as received via a "datagrams_received"-event, which happens before the packet is processed).

It would be interesting to capture where this packet is going. I am guessing the address handling code is bad.

Therefore the aioquic server can never establish its additional sending uniflow (or respond to the PATH_CHALLENGE-frame from the pquic server).

Correct, but it could send again the PC.

Additionally, the uniflows frames from the pquic client contain 2 entries for the receiving uniflows, which is correct, but also 2 entries for the sending uniflows which i assume is incorrect, since the pquic client's additional sending uniflow has not yet been established.

IIRC, in our code, sending uniflows are activated as soon as you receive credit for it in the form of a CID for the particular uniflow. It looks like the specs define the active state as path usage over a validated address. I'll file a bug for that.

It also seems that the pquic client never sends an ack-of-ack trigger (or ack-eliciting frames), therefore the aioquic server never responds with its own ACKs

We currently request an ACK to prune ACK ranges if more than 10 blocks are used, but otherwise we rely on the server to issue ACKs from time to time. We could change that to bundle an PING every x ACK. It's unfortunate the server doesn't send one from time to time.

There are some instances that the pquic client sends another PATH_CHALLENGE-frame along with a UNIFLOWS-frame, on which the aioquic server is able to respond.

My guess is that is considers the uniflow as broken at some point because it doesn't receive any ACKs, so it marks it as not verified and opens a new uniflow. I think it's a new one because UNIFLOWS frame are only sent once per uniflow.

After the aioquic server has transported all the data and signaled the end with stream 0 FIN-flag enabled, the pquic client keeps logging a lot of 1RTT packet loss events. Then the pquic client starts sending PATH_CHALLENGE frames and PING+PADING-frames for a while (which also restart at packet number 1, I assume these come from the additional sending uniflow on the pquic client).

The sending uniflow is again considered as broken at some point I guess.

So to conclude from all your interesting findings:

mpiraux commented 3 years ago

It would be interesting to capture where this packet is going. I am guessing the address handling code is bad.

I actually just fixed a bug regarding that particular code, which caused memory corruption. The commit is running through our test suite and I should ship it here in the afternoon. EDIT: There it is 9b273319de2aaff96ee55c64c164d694f2bd6b64

mpiraux commented 3 years ago

And how do i include the IPv6 variant "::1"?

The code is correct for IPv4 localhost, you need to compare the in6_addr_t to one initialised to "::1", using memcmp for instance.

The3ternum commented 3 years ago

It seems that the QLOG-file generated by the PQUIC client does not contain "packet_received" or "packet_sent" events, which prevent QVIS from generating a visualisation (according to the console log seen in the screenshot). Screenshot from 2021-01-15 17-15-06

i have used the following parameters to start the client (./picoquicdemo):

-P plugins/multipath/multipath_rr_cond.plugin -P plugins/qlog/qlog.plugin -P plugins/multipath/multipath_qlog.plugin -P plugins/multipath/addresses_filters/filter_only_local.plugin -q qlog_output_client.qlog -G 999999 ::1 4433

There's still something that goes wrong, although i am unable to discern the problem. On the aioquic server I am currently using only sending uniflow 0 until sending uniflow 1 becomes available, then only sending uniflow 1 is used (for testing purposes)

I have included the QLOGs from the aioquic server and the PQUIC client, along with the console log from the pquic client, see Logs.zip

mpiraux commented 3 years ago

Something broke in the QLOG plugin, I am investigating further.

mpiraux commented 3 years ago

@The3ternum, d2acec6 has a fix for the QLOG bug