p4lang / p4runtime

Specification documents for the P4Runtime control-plane API
Apache License 2.0
144 stars 88 forks source link

Consider adding insert_or_update (or apply?) WriteRequest operation #431

Open jafingerhut opened 1 year ago

jafingerhut commented 1 year ago

I believe it is true today that P4Runtime API says something like this (pointers to particular places in the spec or implementation welcome):

There are scenarios where people find it useful to have an INSERT_OR_UPDATE operation, which behaves like this:

If the key does not already exist, do INSERT, else do UPDATE.

antoninbas commented 1 year ago

This was discussed in the early days of P4Runtime so there is some interesting context in https://docs.google.com/document/d/1yMOzmIOJo4_q2yCc1_eGcniag-Q7w2F_f3kKu9JB64c/edit?usp=sharing. The proposal was for an APPLY operation.

The WG was a bit split at the time, with some arguing that the client should be aware of forwarding state and use the correct operation (INSERT or UPDATE), and others advocating for more flexibility.

Different APIs make different choices. As an example, OpenFlow has evolved a bit over time. IIRC, in the later versions:

smolkaj commented 1 year ago

Andy, would you be able to share a few concrete details about cases where this is useful?

INSERT_OR_UPDATE makes sense abstractly, but at least for our SDN controller at Google, we don't need it because we expect the controller to be aware of the precise forwarding state on the switch.

jafingerhut commented 1 year ago

@sayanb @vgurevich Would you be able to provide an answer to Steffen's request for concrete details where an INSERT_OR_UPDATE operation is useful?

vgurevich commented 1 year ago

It is true that in those cases where the controller is fully aware of the state of each switch, many operations (including INSERT_OR_UPDATE) start looking less useful. At the very minimum, it is clear that such a controller can always choose the right operation (INSERT or UPDATE) in each particular case.

Having said that, the operations like INSERT_OR_UPDATE can significantly simplify coding, eliminate the need for extra error handling, etc.

These operations definitely are a lot more helpful in those cases where the controller is not guaranteed to be fully aware of the device state.

From the design perspective, I believe that it is really important for the API to have a full set of idempotent operations and INSERT_OR_UPDATE is one of those (the other one would be DELETE_OR_NOOP_IF_NOT_PRESENT). Their existence eliminates a lot of problems that can be caused by retransmissions, restarts, etc.

chrispsommers commented 1 year ago

To add to Vladimir's response: I too have experimented with backends which essentially wrap BFRT/TDI and I created a MERGE operation (works same as above INSERT_OR_UPDATE) to also avoid errors when I didn't care if the entry already exists. MERGE is used in NETCONF so maybe this term can be adopted?

qobilidop commented 1 year ago

I have come across the name UPSERT previously, but I don't know where it comes from. I did some search this time and found out that it's basically a synonym of MERGE in the SQL world: https://en.wikipedia.org/wiki/Merge_(SQL)

smolkaj commented 1 year ago

Thanks @vgurevich for adding more color.

Having said that, the operations like INSERT_OR_UPDATE can significantly simplify coding, eliminate the need for extra error handling, etc.

Makes sense, certainly when hacking up a quick prototype. For production systems, I suspect that you generally do want to make a distinction between INSERT and UPDATE and that you do want to handle errors -- but again, I am biased based on how our controller works.

From the design perspective, I believe that it is really important for the API to have a full set of idempotent operations and INSERT_OR_UPDATE is one of those (the other one would be DELETE_OR_NOOP_IF_NOT_PRESENT). Their existence eliminates a lot of problems that can be caused by retransmissions, restarts, etc.

I can see why idempotence is a very nice property when you're worried about retransmissions/restarts/etc. At the same time, P4Runtime is based on gRPC, which runs on top of HTTP/2. So retransmissions and restarts are not something you should have to worry about unless the connection between your controller and switch breaks/times out and you don't know whether a previous operation completed or not. At that point, our controller at Google would try to establish a new connection, and if that succeeds, read the entire state from the switch to see where it left things of. In this kind of model, we never have to deal with retransmissions at the P4Runtime-level.

I understand that other might be using P4Runtime differently, I just wanted to provide a perspective to explain why these operations don't make much sense for our use case.

vgurevich commented 1 year ago

@smolkaj -- thanks for the details. So, from what I understand it's not that having such a method would break anything at Google or would otherwise be detrimental to Google's use case -- it was simply not needed.

In general, when designing the API we should definitely consider as many use cases as possible. From little I know about Google's case it is quite special and deliberately designed for simplicity. As a result it requires a lot less functionality than many other cases one can encounter in the industry.

smolkaj commented 1 year ago

it's not that having such a method would break anything at Google or would otherwise be detrimental to Google's use case -- it was simply not needed.

Yes, that's right.

My meta point is that I would like to understand if there are "real" uses cases for this, or if they are merely hypothetical at this point. I would generally push (gently) towards evolving P4Runtime based on real needs so we don't end up with features that no one is actually using, given that there is a cost for every feature we add.

At a technical level, I have no issues with this proposal.