Have a section in PNA to describe some properties of a desirable hardware target

The main issue to be addressed in such a section (perhaps an appendix) is: Does the hardware target support applying a single "physical" P4 table for both net-to-host and host-to-net packets.

My current thinking on this topic is that the answer should be "yes". If the answer is "no", then such a target is probably not a good one for the PNA architecture.

Why? Because for a feature like TCP connection tracking, we want the ability to write in P4 code a table that:

(a) is apply'd for packets in both directions, and (b) either or both directions of processing packets by that table in the data plane can cause modifications to the state of the entry matched, without the control plane making the modification

Examples of tables having entries modified by having apply() called on them are:

updating a DirectCounter extern associated with the table
updating a DirectMeter extern associated with a table
updating a DirectRegister extern associated with a table (assuming that we define such an extern in PNA, which there are multiple use cases for and target NIC devices that can implement this)
having a table with property idle_timeout=true, where apply() on that table causes some state associated with the entry to indicate that it was recently matched, and therefore should not time out until its full configured timeout interval has elapsed
having a table with new proposed PNA features where the timeout interval of a hit entry is updated as a result of the apply() call being performed.
having a table that adds a new entry on a miss

In any of these cases (or any combination of them for the same bidirectional table that is supported by a target device), trying to implement a bidirectional table with two physical tables in the target device, one for each direction, seems extraordinarily difficult, akin to implementing hardware cache coherency mechanisms in a multi-core general purpose CPU.

It seems far more straightforward to implement such behavior correctly by having a single table that handles the total packet rate of both net-to-host and host-to-net packets, and processes each packet logically "one at a time" (perhaps with pipelining optimizations, of course). The lower packet rate of NICs versus typical switch ASICs makes this far more feasible to do in a NIC than in most switch ASICs.

There can still be significant variations in hardware designs that meet this goal.

For example, a target might have a single physical pipeline that processes both host-to-net and net-to-host packets, interleaved and sent into the pipeline on different clock cycles. The bidirectional tables are just another table in the pipeline that "activate: for both directions of packet. Unidirectional tables simply are not activated when a packet in the opposite direction is being processed.

Another variation is a hardware design with two separate pipelines, one for net-to-host packets, the other with host-to-net packets, but they both have high-bandwidth access to some common shared hardware that can implement bidirectional tables.

The main kind of hardware design that PNA would preclude would be one where there are two separate pipelines, one for host-to-net packets, the other for net-to-host packets, and they CANNOT access any shared state that can implement a bidirectional table, i.e. it can only implement unidirectional tables. In the absence of some kind of hardware cache coherency protocol type implementation between the two pipelines to implement a bidirectional table with two forced-to-remain-coherent tables, I don't see how such a hardware design could implement bidirectional tables with stateful updates from the data plane.

p4lang / pna

Have a section in PNA to describe some properties of a desirable hardware target #7