p4lang / p4runtime

Specification documents for the P4Runtime control-plane API
Apache License 2.0
140 stars 86 forks source link

Why do multicast group entry instance numbers have to be unique? #456

Closed smolkaj closed 2 months ago

smolkaj commented 10 months ago

The specification states:

A multicast group may be inserted, modified or deleted as per the following semantics. INSERT: [...] No two replicas may have identical values of both port and instance, or the server must return INVALID_ARGUMENT.

But why not?

We have a use case for sending several identical copies of a packet out through the same port, and this restriction is a bit annoying. I am trying to understand why it was introduced. Maybe @antoninbas would remember.

@verios-google @jonathan-dilorenzo for visibility.

antoninbas commented 10 months ago

I think I may be missing the question. instance is meant to let you send multiple copies of the same packet out of the same port. By definition, instance is here to distinguish between multiple identical copies in egress.

smolkaj commented 10 months ago

Why does P4Runtime not allow me to send multiple copies of the same packet out of the same port without coming up with unique instance numbers?

smolkaj commented 10 months ago

Specifically, why is the following disallowed by the standard?

type: INSERT
entity {
  packet_replication_engine_entry {
    multicast_group_entry {
      multicast_group_id: 1
      replicas { port: "1" }
      replicas { port: "1" }
      replicas { port: "1" }
    }
  }
}
jafingerhut commented 10 months ago

I think because it is almost never what someone wants. For IP multicast, for example, you typically always want two copies going out the same physical port to have the capability of being treated differently in an L3 router, e.g. different egress ACLs, different egress QoS/policing/marking, different tunnel encapsulations, etc.

If two copies have the same contents exactly, including instance id, then egress processing can't do anything different with them (except by using a stateful construct like a P4 register array to store information about each one, which is a waste of effort in my opinion because it is drastically easier for the control plane running on a general purpose CPU to come up with unique instance ids).

jafingerhut commented 10 months ago

I guess stated another way: why do you consider it in any way difficult for control plane software to come up with distinct instance ids for the same output port?

Or, if you really really want two copies to have the same instance ids, what is the scenario where you want to copies of a packet sent to the same output port to be treated identically in every way?

smolkaj commented 10 months ago

Thanks, Andy. So it's disallowed because it was deemed that it would be more likely to be a bug than intentional. This makes sense and is what I suspected, but I wanted to double check.

antoninbas commented 10 months ago

My best guess is that the origin of this is a combination of: 1) Some switch SDK requiring different ids for copies being sent to the same port for a given multicast group (see the rid in https://github.com/p4lang/PI/blob/6d0f3d6c08d595f65c7d96fd852d9e0c308a6f30/include/PI/pi_mc.h#L53, which was probably inspired by a vendor SDK) 2) The need to be able to distinguish between the multiple copies in the P4 egress pipeline, in a user-controlled fashion. The user needs a way to specify actual values as these values will be used as is in the P4 program.

You could auto-generate sequential values in the controller?

I can also see us modifying P4Runtime to auto-generate sequential (or even random) unique replication ids in the server when the user doesn't provide them (or not, if the server is specific to a target that doesn't require them). In the case of the PI implementation, I imagine that it won't be too complicated. We could make the instance field optional, but I never know if this is 100% backward-compatible. I believe this is the same as moving the field into a oneof, which we have definitely done in the past.

Edit on backward-compatibility: I think it's non trivial in this case, given that 0 is a valid value for instance. So it may be a different situation from what we had in the past.

smolkaj commented 10 months ago

why do you consider it in any way difficult for control plane software to come up with distinct instance ids for the same output port?

It is not difficult, but having "garbage" (i.e., data with no impact on the semantics) going across the wire is a bit dissatisfying. Certainly not a huge deal.

smolkaj commented 10 months ago

Or, if you really really want two copies to have the same instance ids, what is the scenario where you want to copies of a packet sent to the same output port to be treated identically in every way?

You might want to do such a thing, for example, in a traffic generator.

jafingerhut commented 10 months ago

So, I do believe that having the restriction that (egress_port, instance_id) pairs must be distinct is at least partly motivated by it seeming to be more like a bug if two or more such pairs were identical. Antonin's suggestion that some vendor SDK required this may also be one of the original motivations -- I am not familiar with any details there.

I could imagine creating some kind of configurable mode in the P4Runtime API server, perhaps configurable via a WriteRequest message from a client, that could be changed at runtime to either enforce the current restriction, or to eliminate that restriction (i.e. explicitly allow as many duplicate (egress_port, instance_id) pairs in the same multiset as you like). If that is of interest, it does seem nice to keep the current behavior as the default for backwards compatibility.

smolkaj commented 10 months ago

Thanks for elaborating, @jafingerhut and @antoninbas.

If we think that sending duplicate packets is a bug in most cases and intentional only in rare circumstances. then it seems not unreasonable to have people "opt-in" explicitly in cases where this is desired behavior.

So at this time, I am not planning to propose a change of the spec. Should we close this?