microsoft / net-offloads

Specs for new networking hardware offloads.
MIT License
27 stars 3 forks source link

What model do existing pacing offloads supported by NICs use? #45

Open maolson-msft opened 1 year ago

maolson-msft commented 1 year ago

Pacing can be done two ways: either each packet is posted to the NIC with a timestamp (which feels like what we ideally want to have), or you can plub a rate limit to the nic, post packets without any metadata (or maybe with a flow ID to match them up with a per-flow plumbed rate limit), and the NIC decides when to send each packet.

I did some prelim research on this topic a while back and was surprised to read in some docs (IIRC from Mellanox) that the second model above is used.

We probably want the first model, but if NICs are already doing the second model that will be an ugly ask.

maolson-msft commented 1 year ago

(I'm really hoping the above experience was really just a nightmare I was having. This Issue is to basically find out whether that's the case.)

nibanks commented 1 year ago

The first one is the model we're going with.

maolson-msft commented 1 year ago

That doesn't answer my question, though. I'm asking, what model do the NICs currently use?

maolson-msft commented 1 year ago

By the way, I found what I was reading before, here's a link: https://docs.nvidia.com/networking/display/VMAv875/Advanced+Features#AdvancedFeatures-PacketPacing

They describe an API that smells like a global pacing rate for the whole interface, which isn't what we want; then they mention something called the "rivermax library" for more "advanced" pacing, but it's a dead link.

BorisPis commented 1 year ago

Pacing can be done two ways: either each packet is posted to the NIC with a timestamp (which feels like what we ideally want to have), or you can plub a rate limit to the nic, post packets without any metadata (or maybe with a flow ID to match them up with a per-flow plumbed rate limit), and the NIC decides when to send each packet.

I did some prelim research on this topic a while back and was surprised to read in some docs (IIRC from Mellanox) that the second model above is used.

Both models should be possible, we have work in DPDK with the second model, here are some references: • RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME • https://doc.dpdk.org/api/rte__mbuf__dyn_8h.html#aaef84c5eadedb4c2fad3d49eabd7f0df • Implemented in the testpmd sample application, see here: https://github.com/DPDK/dpdk/blob/main/app/test-pmd/txonly.c#L259