Open mtfriesen opened 1 year ago
For question (2), since these connections are being plumbed via direct OIDs, intermediate components like XDP (or any other LWF) simply cannot reliably synchronize with the state in the rest of the stack.
ConnectionID could be empty and rely on ephemeral client port. If server socket is not connected, the connection will need to use source and destination L3/L4 information to match the connection on the transmit. It is also expected that the physical NIC will not match flows from L3/L4 information or connection ID in the packet header and would rely more on the crypto key index provided with the data (skbuff in Linux, or NDIS packet in Windows). The method to uniquely identify a connection would be different between transmit and receive. On receive the NIC will need to match the flow.
Based on discussion in https://github.com/microsoft/quic-offloads/issues/57, I agree that we can generally eliminate any identification logic for the TX path, because we'd pass the opaque handle down with the packet.
So the real question is on the RX path, how does the NIC identify a connection? I added some pseudocode in the past here, but there have been other discussions, such as in https://github.com/microsoft/quic-offloads/issues/27, to try to simplify things to reduce the complexity (i.e. requires a two-pass lookup right now).
There are several questions here.
What assumptions can various layers of the stack make about uniqueness? Can NIC drivers assume there will be no duplicate connections?
The rest of the stack needs a clear specification if it is obligated to avoid duplicate entries, else it should be clear that only the NIC is responsible for arbitration.
Can NIC drivers assume there will be no duplicate connections?
Yes, it should assume no duplicates. Doing a "set" on an existing connection is meant to update it, say to rev the KeyPhase.
The rest of the stack needs a clear specification if it is obligated to avoid duplicate entries, else it should be clear that only the NIC is responsible for arbitration.
Really, the arbitration problem exists at the tuple layer, and really that must already be handled independently of QEO, right? Different apps must use different tuples (or something like CIBIR to differentiate).
That's where things get tricky. We're specifying that we're using direct OIDs, and also that the layers above the NIC will not send duplicate OIDs down the stack. The nature of the NDIS design today precludes synchronization of LWFs with the rest of the stack on the direct OID path; therefore either the LWFs must communicate with a OS-wide arbitration component, otherwise either LWFs cannot issue this OID or the NICs must not assume a coherent upper layer.
will not send duplicate OIDs down the stack
Just to clarify, by "duplicate OIDs" you mean OIDs for the same connection from different parties? If that's the case, then the fact that we already have designs to leverage the OS port pool to arbitrate tuple usage, I think we should be fine. Is that not the case?
It is common in modern NICs to generate hash based on L3/L4 headers. For QUIC, the uniqueness of the situation is that the connection ID length is not sent with the packet. This creates a complication in the packet processing - it requires to run an extra lookup.
When the connection ID is empty - there could be only one connection with an empty connection ID for the given quadruple of <srcip, srcport, dstip, dstport> to lookup. For the other cases - the connection ID between a pair of ports could be anything, and of different length.
There is a limitation by vendors to make connection ID either empty or fixed in value. This, while makes things simpler on the Rx pipeline and eliminates extra lookup and memory for extra tables, it limiting what could be done on the service side.
Currently, only Broadcom is known to support QUIC in future generation NICs. The interface between kernel and user space has been a subject of collaboration for some time and generally is aligned with no-lookup Tx and flow match Rx. Although, the connection ID is a subject for further consultations between the industry and the vendor.
Another option is to burn a few bits in the connection ID to dedicate to CID length encoding. I'm not a big fan of that approach, but we should at least consider all options.
Large datacenter based use cases of QUIC use full 20 bytes long CID to store routing information and other what-nots in there, which is totally allowed by the standard as these bytes neither are monotonic nor assigned to anything special. If the approach is taken to map to a fixed number - it should be a maximum. Even though limiting it is something we should really try to avoid, if possible.
NDIS_QUIC_CONNECTION
uniquely identify a connection?UdpPort
,ConnectionIdLength
, andConnectionId
? What aboutAddress
?