microsoft / net-offloads

Specs for new networking hardware offloads.
MIT License
26 stars 3 forks source link

Opaque Handle to Eliminate Lookup on TX #57

Open BorisPis opened 1 year ago

BorisPis commented 1 year ago

To simplify hardware offload on transmit, it will be useful to use an opaque NIC driver generated handle (e.g., 4 bytes) for each connection ID. This handle should be provided alongside transmitted packets. Hardware can use this handle to reduce the overhead of looking up the state for this packet, and possibly also to skip parsing the QUIC header if software can guarantee a specific format.

Also, can you explain how the ConnectionIdLength would be useful on the NDIS_QUIC_ENCRYPTION_NET_BUFFER_LIST_INFO associated with every packet? Is it really necessary to pass this data---that doesn't change---with every packet of a flow?

nibanks commented 1 year ago

use an opaque NIC driver generated handle

So, are you suggesting an output uint32_t be added as an output to NDIS_QUIC_CONNECTION, which is then also added to NDIS_QUIC_ENCRYPTION_NET_BUFFER_LIST_INFO in the send path? I think that'd be doable. The OS already has to do a first level lookup to see if a packet needs to be done in SW or HW. If we determine it's done in HW, then we pass the uint32_t down.

can you explain how the ConnectionIdLength would be useful

This is necessary if we don't have the global requirement that all CIDs be the same length, which was proposed on another issue.

P.S. Let's keep separate questions to separate issues. Thanks!

BorisPis commented 1 year ago

So, are you suggesting an output uint32_t be added as an output to NDIS_QUIC_CONNECTION, which is then also added to NDIS_QUIC_ENCRYPTION_NET_BUFFER_LIST_INFO in the send path?

Yes, exactly.

I think that'd be doable. The OS already has to do a first level lookup to see if a packet needs to be done in SW or HW. If we determine it's done in HW, then we pass the uint32_t down.

Makes sense

nibanks commented 1 year ago

I've updated this issue to generally track the (good idea) from both @borispis and @rawsocket (but at different layers) to have the control path of adding a new connection ID offload return some kind of opaque handle that can then be included in the datapath to eliminate the lookup costs. This could be done at both the UM to KM boundary and SW to HW boundary.

One issue this might cause is synchronization issues around lifetime of the offload info on the control path, with the datapath. We will have to very clear about what restrictions (if any) we'd put on the datapath. Ideally, I'd like to handle the complexity in the control path, and have it gracefully handle a raced send with handle that is getting deleted.

mtfriesen commented 1 year ago

We want to minimize the number of lookups required, so we're thinking on the TX path of providing the NIC with two IDs per TX:

The idea is that since the NIC [driver/FW/HW] has to look up some state for each connection anyways, we may be able to perform the connection lookup at the same time as enforcing isolation, minimizing duplicated state and duplicated validation.

It would also be useful for the NIC to provide the protection ID associated with each RX packet, especially if offload connection IDs are constrained to a small range.

BorisPis commented 1 year ago

A connection offload ID sounds to me like a global identifier from the device's perspective rather than an identifier within some protection domain. In general, from the device's perspective, it is best to have as short as possible IDs unique to that device, otherwise there is bound to be some overhead. For example, to match long identifiers (e.g., protection domain ID + connection offload ID) the device may need multiple match operations, but if the device generated these IDs, then it would choose to use IDs that will fit in one match operation.

nibanks commented 1 year ago

We need to provide security boundaries across different applications/processes. One process must not be able to delete the CID offloaded by another. So, something has to do this enforcement.

rawsocket commented 1 year ago

Security boundary would need to be maintaned and protected by a trusted root. In Linux with sockets API it is the kernel ULP, which could provide a simple virtualization between NIC global set of crypto key IDs and individual per-socket pools. This will not add any new lookups, and only one extra indirection.

The Tx path would only require a subset of parameters to register a key: length of CID, algorithm and keys. The result will be ok or nak.

On the Rx path the isolation would also be necessary to not to allow the random delete from rogue processes. The Rx connection crypto install request will be different as it needs L3/L4 information to later match the flow. In theory, only one empty CID could be there for the same source and destination; but verbally this does not stop from maintaining multiple connections with empty CID coming to the same port on the server and running from different clients. Hence, more than just destination IP/port might be needed to correctly match the flow.

Again, similarly to Tx, the Rx control plane will translate kernel-to-device global crypto index into socket local equivalent and keep the mapping. A single array of redirection and a linked list of free elements would make it o(1) on all operations (also known as hash-linked-list in c++ world).

rawsocket commented 1 year ago

A connection offload ID sounds to me like a global identifier from the device's perspective rather than an identifier within some protection domain. In general, from the device's perspective, it is best to have as short as possible IDs unique to that device, otherwise there is bound to be some overhead. For example, to match long identifiers (e.g., protection domain ID + connection offload ID) the device may need multiple match operations, but if the device generated these IDs, then it would choose to use IDs that will fit in one match operation.

It might be worth to leave device unaware of protection domain to make this similar across multiple vendors. The kernel should do the work. For XDP use case, the kernel must be involved in some way too for control path to provide separation and use process ID in conjunction with absolute start time to maintain integrity over PID reuse.

nibanks commented 1 year ago

This will not add any new lookups

Perhaps in the world where you are applying the offload to a socket (that has a particular tuple bound/listening), but for something more generic like XDP you don't have such an object to align/verify the offload to, to prevent independent apps from conflicting/attacking each other. In this case, you do require an additional lookup if you want to provide any protections in this space.

It might be worth to leave device unaware of protection domain to make this similar across multiple vendors. The kernel should do the work. For XDP use case, the kernel must be involved in some way too for control path to provide separation and use process ID in conjunction with absolute start time to maintain integrity over PID reuse.

I do agree that doing it in the kernel allows for a single (well, maybe two, one for sockets and one for xdp) place for this logic to live, and not require all vendors to do the logic. But my line of thinking is "How complicated is this really for the vendor to implement?" and "Can they do this more efficient than the kernel?"