quicwg / multipath

In-progress version of draft-ietf-quic-multipath
Other
51 stars 18 forks source link

Take Hardware Offloads into Account #25

Open nibanks opened 2 years ago

nibanks commented 2 years ago

While discussing single packet space vs multi packet space designs, @martinduke brought up the topic of hardware offloading. The effect of any design changes on packet encryption/decryption should take HW offloads into account. As I understand it, single packet space would not modify the packet encryption/decryption logic, but the multi packet space design would because of the difference nonce length. Additionally, since multi-path would be a negotiated feature/extension, that would mean all connections that don't negotiate the feature would have the "old" model/logic, where as the connections that do would have the new encryption/decryption logic. IMO, this could significantly complicated HW offloads.

martinduke commented 2 years ago

A few disconnected points on this subject:

on the other hand:

It would be very good to engage with the right people in industry, do some thinking about what these APIs would likely resemble, and if possible come up with a crypto design that is consistent with that.

mirjak commented 2 years ago

I would be hoping that any hardware offload would already be able to handle a different nonce length as that seem more future proof anyway. However, Martin probably has a good point that is makes sense to reach out and make this requirement explicit now.

nibanks commented 1 year ago

Based on recent changes about removing single packet number space, have there been changes on this topic? We'd love to get feedback on https://github.com/microsoft/quic-offloads with response to multi-path to make sure our HW offload design can support it.

Yanmei-Liu commented 10 months ago

I think we need to find out whether hardware offloading is still working with multi-path extension before the WG last call. Considering: https://lpc.events/event/17/contributions/1592/

Yanmei-Liu commented 8 months ago

We've had some discussion with Eric for supporting hardware offloading with multi-path extension. Here's the solution suggested by Eric:

(original e-mail content) From:Eric Davis eric.davis@broadcom.com Sent At:2024 Jan. 5 (Fri.) 20:36 To:LIU Yanmei miaoji.lym@alibaba-inc.com Cc:Andy Gospodarek andrew.gospodarek@broadcom.com; mbuhl mbuhl@moritzbuhl.de; martenseemann martenseemann@gmail.com Subject:Re: Offloading Encryption to QUIC Enabled NICs with Multi-path Extension

Hi Yanmei,

Thank you for presenting the high level details of the multipath draft. I've also read through the draft itself and I think it's in a good place to support the current QUIC offload design without requiring any hardware changes. This is good.

First thing to address is the tweaks to the AEAD nonce generation required for multipath. As we know, the non-multipath nonce generation has the hardware XOR'ing the 12B IV with the 62b pkt_num (which is left padded with zeros to 12B). For multipath, the nonce must be tweaked even more to ensure a unique nonce is always used for any multipath packet using the same key. The solution in the current draft rolls in the destination connID sequence number. This makes sense. In order to not change the hardware and still support multipath, the QUIC kernel module and/or driver must XOR in the destination connID sequence number to the IV. The result is the IV that is offloaded for the flow to the hardware. The hardware continues to do the same nonce generation with this IV XOR'ed with the pkt_num.

Example from the multipath draft RFC: IV: 0x6b26114b9cba2b63a9e8dd4f Connection ID Sequence Number: 0x3 Packet Number = 0xaead Nonce = (IV XOR ((connid_seq_num << 64) | pkt_num)) ((0x3 << 64) | 0xaead) = 0x00000003000000000000aead 0x6b26114b9cba2b63a9e8dd4f (IV) XOR 0x00000003000000000000aead ((connid_seq_num << 64) | pkt_num)

0x6b2611489cba2b63a9e873e2

Same method supporting existing offload mechanism: IV: 0x6b26114b9cba2b63a9e8dd4f Connection ID Sequence Number: 0x3 Packet Number = 0xaead New IV passed in flow offload = (IV XOR (connid_seq_num << 64)) 0x6b26114b9cba2b63a9e8dd4f (IV) XOR 0x000000030000000000000000 (connid_seq_num << 64)

0x6b2611489cba2b63a9e8dd4f

Nonce (hardware): (IV XOR pkt_num) 0x6b2611489cba2b63a9e8dd4f (offloaded IV) XOR 0x00000000000000000000aead (pkt_num)

0x6b2611489cba2b63a9e873e2

Second thing is whether or not the destination connID sequence number is used in the nonce or a new path identifier. From a hardware perspective it doesn't matter. The nonce construction would be the same as I detailed above with the driver XOR'ing in the connid_seq_num or the path_id into the IV before offloading the flow. There is an implication though, as you mentioned, if the destination connID sequence number is used, the IV in the offloaded flow will change. To support this, the offloaded flow would have to be deleted from the hardware and then re-offloaded with the updated IV. Alternatively, if a unique path identifier is used, the flow would never need to be removed and re-offloaded again.

My vote is to move the multipath design to support a unique path identifier as it's cleaner. But as you can see, it's not required from a hardware perspective. Another thing to note, the connID is part of the flow lookup on the Rx side. Simple hardware implementations would require the flow to be removed and re-offloaded with a connID change. As we've discussed with numerous customers, this is acceptable.

Feel free to copy these ideas into this GitHub issue for further discussion: https://github.com/quicwg/multipath/issues/25

If you haven't seen it, we also presented QUIC offload at the last OCP Summit. This presentation complements that one Andy gave at Linux Plumbers. https://www.youtube.com/watch?v=IAvQhJSm6O8

It's important to keep the hardware offload design as simple as possible. I don't see any need to make changes to the current offload design in order to support multipath. Let us know if you need any help with specifics going into the next IETF wg meeting. :-)

Thanks,

  • e
mirjak commented 2 months ago

@Yanmei-Liu and @nibanks thanks for the continued discussion! It seems that the current design is fine with respect to hardware-offload and I believe the design is stable now. So I think we could actually close this issue now?

Or do you maybe want to add any notes about hardware offload in the implementation consideration section?