p4lang / pna

Portable NIC Architecture
Apache License 2.0
55 stars 21 forks source link

Address the issue of pipelines that can have packets "pass each other up" in the middle of the pipeline #96

Open jfingerh opened 1 year ago

jfingerh commented 1 year ago

It is expected that some PNA implementations will have on-chip caches between the packet processing pipeline and large tables stored in off-chip DRAM. Some will probably enable packets that hit in such a cache to "pass up" earlier arriving packets that experienced a miss in the cache, and are still waiting for the data to be fetched on-chip from DRAM, for higher packet processing throughput.

Typically a P4 developer would NOT want this to happen for packets in the same "flow", but given the programmable nature of P4-programmable devices, it is likely that the definition of a "flow" is best left to the P4 developer to define, too, for the purposes of such packet ordering guarantees, versus which packets may complete processing out of order.

There should be some mention of this somewhere in the PNA spec, I suspect, or if not in the PNA spec, then in the documentation of any PNA device vendors whose devices have such behavior.

This issue was created in response to a comment on a different PNA issue here: https://github.com/p4lang/pna/pull/92#discussion_r1028601633 If there is discussion of this topic in the PNA spec, it should address the possible interaction with add-on-miss tables explicitly.

thomascalvert-xlnx commented 1 year ago

The spec already defines a 32-bit FlowId_t, which could be added to standard metadata. Currently we only have the extern allocate_flow_id() -- users might also wish to generate a FlowId in the parser based on metadata or header values. The rule would be that packets with different flowid values would not need to maintain relative ordering.

However it is likely that some (most?) hardware targets won't support 2B of the "flow queues" described above, instead having vendor-specific limits. The spec could say that the hardware considers as many of the most significant bits as it can - that way the user's P4 program could for example put the input network port number in the top bits of the flowid, thereby causing only packets coming from the same port to maintain relative ordering. (This might not be a smart thing to do - just an example.)