p4lang / p4c

P4_16 reference compiler
https://p4.org/
Apache License 2.0
665 stars 440 forks source link

Compiler silently allows duplicate name annotations #1949

Open jafingerhut opened 5 years ago

jafingerhut commented 5 years ago

I will add another comment with example program soon.

mihaibudiu commented 5 years ago

I don't think this is a problem if either they do not impact the control plane or if the duplicate objects are disambiguated in some way, e.g., actions used in different tables.

jafingerhut commented 5 years ago

Tested with this version of p4c source code:

commit b806cecb70c19620acbb86c963d4f84b48c0a51f
Author: fruffy <5960321+fruffy@users.noreply.github.com>
Date:   Thu May 23 23:26:23 2019 +0200

When I compile with this command:

$ p4test --p4runtime-files issue1949.p4info.txt issue1949.p4

I get no errors, and the P4Info file generated is wrong. It contains two action_refs with the same id, both for the same action named "foo". I believe a correct result would be an error from the compiler that there is a duplicate name annotation "foo" for two different things, which should not be allowed, and no P4Info file generated at all.

There is an error when I run with this command:

$ p4c --target bmv2 --arch v1model --p4runtime-files issue1949.p4info.txt issue1949.p4
terminate called after throwing an instance of 'std::logic_error'
  what():  Duplicate label in json object foo

but it still writes an incorrect P4Info file, the same incorrect one written by the command above. Again, I believe a better result would be no P4Info file written, and an error message indicating why the program is erroneous.

issue1949.p4.txt issue1949.p4info.txt

mihaibudiu commented 5 years ago

I have assigned @antoninbas , since he wrote the code that checks for duplicates. I guess the rule to enforce is that two actions with the same name cannot be used within the same table - unless the control-plane can use other tools, like argument names, to disambiguate.

jafingerhut commented 5 years ago

It seems like it would be correct to be more strict, and say that no two actions can have the same "full path name", where by that I mean that it could be an error if two actions had fully qualified names that both ended up being ".foo.bar.baz", regardless of whether those actions were in the same control, or different controls.

If it were allowed for two actions in different controls (and thus cannot be actions of the same table) to have the same full name ".foo.bar.baz", how would that work in tools like simple_switch_CLI and/or P4Runtime API that does name<->id translation?

antoninbas commented 5 years ago

There is no good way to handle this because 1) P4Runtime considers that different actions with the same name annotation is illegal, while 2) the compiler considers it is legal and actually emit such P4 code when the same action is used in 2 different tables (so we have to handle this case as gracefully as possible). The following code can never be handled "correctly" with the current P4Runtime version:

@name(".a") action a_1(...) { ... }
@name(".a") action a_2(...) { ... }
table t1 { actions = { a_1; } ... }
table t2 { actions = { a_2; } ... }

P4Runtime will include a single action "a", and both tables will refer to "a". This works for code generated by the compiler (assuming a_1 and a_2 are not optimized differently). But there is no way to emit a "correct" P4Info for this program. Yet if the P4 programmer writes such P4 code (with completely different action definitions), we will happily generate a P4Info for it and discard one of the actions.

One hacky workaround would be to check if the actions are "semantically" the same (e.g. by dumping them to P4 and comparing the emitted strings) and emit an error if they have the same name but are not the same. That's probably better than the current behavior, but it will fail if the p4c frontend ever starts optimizing multiple copies of the same action differently (that's probably not the frontend's job anyway).

jafingerhut commented 5 years ago

Why does the compiler consider it legal, and emit such code? I am guessing there is some history here that I am probably unaware of, and/or extant code that relies on this in P4_16 programs? Or in P4_14 programs auto-translated to P4_16 programs?

It seems like a very awkward position to be in if the compiler silently generates incorrect code (I am including P4Info files as "code" here).

jafingerhut commented 5 years ago

Oh, and one of the reasons I created this issue is that there is a proposed PR for p4c, where it would create two "different" actions from the same original action definition, one with a default parameter value, one with a run-time variable parameter value, and they have the same "@name" annotations on them, for the same table.

You can see my comment on one of the intermediate p4c files here: https://github.com/p4lang/p4c/pull/1948/files#r289263177

antoninbas commented 5 years ago

Yes it is very common for the compiler to duplicate action definitions and give them the same @name annotations. I believe defining an action at the top-level and then using it in multiple controls will create one copy per control. I haven't checked in a while, but it used to be that the compiler would actually generate many instances of "NoAction".

jafingerhut commented 5 years ago

But if they are in different controls, then unless the user give a "rooted" @name annotation, all of them will get unique names prefixed with the control name, so those should never collide in the full name, only prehaps in the part of the name after the last ".". True?

antoninbas commented 5 years ago

The frontend transforms this:

action act1() { }

control ingressImpl(inout headers_t hdr,
                    inout metadata_t meta,
                    inout standard_metadata_t stdmeta)
{
    table t1 {
        key = { hdr.h1.f1 : exact; }
        actions = { act1; NoAction; }
        const default_action = NoAction;
    }
    apply {
        t1.apply();
    }
}

control egressImpl(inout headers_t hdr, inout metadata_t meta, inout standard_metadata_t stdmeta) {
    table t2 {
        key = { hdr.h1.f1 : exact; }
        actions = { act1; NoAction; }
        const default_action = NoAction;
    }
    apply {
        t2.apply();
    }
}

into this:

control ingressImpl(inout headers_t hdr, inout metadata_t meta, inout standard_metadata_t stdmeta) {
    @name(".act1") action act1() {
    }
    @name(".NoAction") action NoAction_0() {
    }
    @name("ingressImpl.t1") table t1_0 {
        key = {
            hdr.h1.f1: exact @name("hdr.h1.f1") ;
        }
        actions = {
            act1();
            NoAction_0();
        }
        const default_action = NoAction_0();
    }
    apply {
        t1_0.apply();
    }
}

control egressImpl(inout headers_t hdr, inout metadata_t meta, inout standard_metadata_t stdmeta) {
    @name(".act1") action act1_0() {
    }
    @name(".NoAction") action NoAction_1() {
    }
    @name("egressImpl.t2") table t2_0 {
        key = {
            hdr.h1.f1: exact @name("hdr.h1.f1") ;
        }
        actions = {
            act1_0();
            NoAction_1();
        }
        const default_action = NoAction_1();
    }
    apply {
        t2_0.apply();
    }
}
jafingerhut commented 5 years ago

Again, is there a good reason for it to be this way? It seems like a ticking time bomb of potentially subtly wrong compiler output to me.

jafingerhut commented 5 years ago

There is also this sentence in the P4_16 language spec that seems to conflict with this behavior. Not sure which one should be changed yet:

Section 17.3.2. Annotations controlling naming

"Programs that yield the same fully-qualified name for two different controllable entities are invalid."

Section 18.2.3.1. Restrictions

"Each element may be annotated with at most one @name or @hidden annotation, and each control plane name must refer to at most one controllable entity."

and a little bit later:

"However, the @name(".foo.bar") annotation renames table t in both instances to foo.bar, resulting in one name that refers to two controllable entities, which is illegal."

antoninbas commented 5 years ago

I'll @mbudiu-vmw answer this question more thoroughly. My understanding is that this can be useful for optimization. It makes sense to me when you have a parameter with a direction:

action act(in bit<1> b, ...) {
    if (b == 0) {
        ...
    } else {
        ...
    }
}

If t1 lists act(0) in its actions list and t2 lists act(1), then I can see why it is potentially interesting to create different versions of act in order to eliminate the if statement.

mihaibudiu commented 5 years ago

That is correct, we always wanted to leave the compiler the ability to change the body of actions, e.g. moving code between actions or tables. So if you use an action twice it does not imply that in the end the two implementations have to be identical.

We could perhaps add a compiler flag or annotation saying: do not duplicate this action (at least, not within a control). Some passes may break, though.

antoninbas commented 5 years ago

@mbudiu-vmw is there any way (that doesn't involve the @name annotation) for me to tell when generating the P4Info that multiple actions refer to the same original user-defined action? That would be helpful and I think it would enable me to workaround that issue nicely.

mihaibudiu commented 5 years ago

Currently the @name annotation is the only way to guess that something is derived from the same source. This assumes that the user didn't put the annotations these by themselves. Perhaps we can differentiate between the two by checking whether the annotation has any source code position. I guess that checking the source position of the annotation is the most reliable way.

If you want a truly reliable way we can add another annotation (some sort of unique identifier) which would be carried by the compiler through all transformations.

But I think that these are really just band-aids, which do not really address the fundamental problem. Whether action names can be duplicated can only be answered if we know all the operations that the control-plane can do on actions. If action names are always relative to a table, and table names are unique, then there is a clear way to disambiguate actions with the same name.

hesingh commented 5 years ago

I do use actions in p4-16 code where an action is not tied to any table. I just call an action from the control's apply block. I know the compiler generates internal default table for such actions. If any action is duplicated in another control, the compiler can use "action_1()" similar to NoAction_0() and NoAction_1().

jafingerhut commented 5 years ago

I am just brainstorming here, as I am probably still missing something fundamental here in how p4c auto-creates these @name annotations, but would it help if there was an annotation like @compiler_generated_name, one that the user was not allowed to include in the P4_16 programs provided to the compiler, but that the compiler used when creating its own things that work similarly to @name annotations?

Re-reading Mihai's last comment above, that might be his "some sort of unique identifier" suggestion.

mihaibudiu commented 5 years ago

The action name is actually the compiler-generated name. There is a very simple pass that generates @name annotations very early, naming each object with the name it has in the source code. The confusion occurs because both the compiler and the user can write @name annotations. So when we see one we don't know where ti comes from.

mihaibudiu commented 5 years ago

We could add an optional boolean second argument to @name to indicate that it is generated internally.

jafingerhut commented 5 years ago

Another line of thinking here:

Is there an early pass of the compiler, where it currently adds guaranteed-unique @name annotations to all things that do not already have a user-provided one?

Is that pass before any inlining or action/control/whatever "duplication" is done by the compiler?

If so, would it be correct to add a pass at that point, that gives an error if there are any duplicate name annotations, when resolved to the full hierarchical name (i.e. one that begins with ".")?

Then, assuming that check finds no duplicates, the compiler is then free to duplicate things and their @name annotations, as long as it does so in a way that preserves correctness (whatever that precisely means, which I am not 100% clear on)?

mihaibudiu commented 5 years ago

That is the uniquenames pass, I believe. But inlining and instantiations create more objects; to know everything that has a name you will have to perform these operations as well.

mihaibudiu commented 5 years ago

We have to start with the contract that the compiler provides to the user. If we don't know exactly what this contract is we cannot implement it. My assumption was that action names are disambiguated by table names.

jnfoster commented 5 years ago

To beat the drum again... yet more evidence we should have a better and more explicit way of specifying the P4 / P4 Runtime interface.

jafingerhut commented 5 years ago

I don't mind hearing the drum beat, but in order for that drum to be beaten correctly, I would guess that any better or more explicit way of specifying the runtime interface needs a good way to name things in the P4 program. That is exactly what is at issue here, I think -- when the compiler's internal passes start duplicating objects like actions, what name should the copies have? One answer is "that shouldn't matter", but I think part of the question here is how to make the compiler so that such duplications do not affect the result.

jafingerhut commented 5 years ago

Mihai wrote in an earlier comment: "That is correct, we always wanted to leave the compiler the ability to change the body of actions, e.g. moving code between actions or tables. So if you use an action twice it does not imply that in the end the two implementations have to be identical."

Why do we always want to leave the compiler the ability to change the body of actions?

Even if we do that, why do we want to allow an action used twice to be modified in the compiler internals, differently from each other?

Is it purely for context-dependent optimization opportunities that might exist in one use of the action, that for some reason do not exist in the other use of the action?

jafingerhut commented 5 years ago

@antoninbas If this assumption in p4c that different tables can have actions that do different things, but have the same @name annotation name string, runs fairly deep, should P4Runtime perhaps adapt itself to explicitly allow that somehow? Such same-named-but-different actions could still have unique id's in P4Runtime.

mihaibudiu commented 5 years ago

In general, what a compiler does is to promise to implement the program you give it - the semantics of the output program has to be the same as the semantics of the input program. In which way it does it should not be important. The more freedom you give to the compiler, the better a job it may be able to do.

In our case the semantics has several components: the dataplane semantics, and the control-plane API. The latter wasn't clearly specified, and that's where we have some problems.

For an ASIC in fact, even if you have the exact same code for an action twice used in two pipeline stages the compiler may generate different code. For example, the pipeline stage where the action is placed is diferent, but perhaps even the live registers and interconnect usage will be different. So in general you should not assume any notion of equivalence between two otherwise identical pieces of source code.

Today in the compiler front-end and mid-end we do not really optimize code across actions, but I don't see why we would promise not to do that. This is like interprocedural optimizations in other compilers - could be very useful if you know something about the context in which an action is invoked. We envisioned for example breaking down actions that are too complex into multiple actions that can be executed independently - either in parallel or in sequence. The policy on how to break actions is, of course, target dependent.

hesingh commented 5 years ago

The foundation was shaky. P4 allows user to specify annotations and the user picks any name which has potential to clash with internal compiler-generated names. We knew, from day one, the compiler does generate internal names for many objects in the P4 program.

I am also for code changes in compiler which are simple - let's not make the compiler complicated. Then, we should try and not use new knobs if the issues can be solved without a new knob.

The scope of the problem lies within p4c frontend processing, at the end of which, p4runtime API is generated. At the least, the compiler should catch duplicates in a P4 program or duplicates arising due to a pass in frontend processing. For a quarter century, one has had unique interfaces with Microsoft COM and OLE objects and their compiler. The DHCPv6 protocol uses a 128-bit DUID to identify a DHCPv6 client. We know means how to make our P4 names unique.

I think the jury is still out regarding compiler changing action body. I think Mihai does have a point and maybe we can poke around p4c/testdata/p4_16_samples/ for action bodies that are screaming to be modified by the compiler.

mihaibudiu commented 5 years ago

The names internally generated by the compiler should never affect the program behavior. If they do, it is a bug. The compiler maintains a symbol table with all symbols used in the program and always allocates fresh symbols, that never occur in the program.

The problem is not about internally allocated names, it is about names that are externally visible, and which the spec mandates should be externally-visible.

hesingh commented 5 years ago

I am using a P4 action not tied to any table. The compiler generates a default table for this action. Few more actions like the first one are defined. With compiler invoking its action body changes or creating new duplicate actions for optimization may lead to a duplicate not liked by p4runtime. I agree with you, it would be a compiler bug. Hence, such tests should be added to p4c.

mihaibudiu commented 5 years ago

Since your actions are not tied to any table I don't think you can do anything with them in P4Runtime, so no matter which way you use them it should not matter.

hesingh commented 5 years ago

Ah, I thought, because the compiler generates a default table for my action-with-no-table, the default table could be exportable to p4runtime. I suppose, that's not the case. Thanks for clarifying.

jafingerhut commented 5 years ago

There are some steps the compiler does before generating P4Info files, and many after. I do not know exactly where in the compiler steps P4Info generation is today, but very likely the introduction of "dummy tables" for unconditionally executed actions/assignments in controls is probably after P4Info generation.

jafingerhut commented 5 years ago

Such a "dummy table" has no keys, and a const default_action, I think, so there wouldn't be much point in making it accessible from the control plane API -- there is nothing that can be changed about it.

jafingerhut commented 5 years ago

OK, last question on this topic for today:

I know that several people spent significant time and effort figuring out when P4Info file generation is performed in p4c, and I was not involved in that, so I am certain to be treading old ground here. If there are any kind of design notes on that decision written up anywhere, please point me to them. If there are not, and you wish there were, I might be enticed to try writing up something.

If I am reading p4c code correctly, P4Runtime generation is done in a function serializeP4RuntimeIfRequired that is called after all front end passes are complete, before any mid end passes are run. For example here is where the p4test front end passes are run: https://github.com/p4lang/p4c/blob/master/backends/p4test/p4test.cpp#L114-L116

and here is where serializeP4RuntimeIfRequired is run for p4test, a few lines of code after the above: https://github.com/p4lang/p4c/blob/master/backends/p4test/p4test.cpp#L127

A similar order occurs for the BMv2 v1model and PSA variants of p4c.

Is there a good reason that various kinds of action duplication are performed before the P4Info file is written? That is, would it be possible to do P4Info file generation even sooner than it is now? I am guessing something would go wrong if so, but not sure what.

EDIT: Another variation of this idea -- perform detection of duplicate @name annotatations in a separate pass, before any objects with @name annotations are duplicated (e.g. actions), but after all things without a user-provided @name annotation have been given one by the compiler. This pass need not wait for the P4Info file generation to be performed. After this distinct-name-annotation checking pass is complete, it would be find for the compiler to create duplicate @name annotations on different things, as long as it did not do so incorrectly (e.g. it could do so for two actions created from the same original action in the source code, but not for two different original actions in the source code).

hesingh commented 5 years ago

For p4runtime API generation to be target-agnostic, the generation has to happen before any mid-end processing starts. Mid-end may invoke a target-specific policy. This is why current behavior is to generation p4runtime API after front-end processing has completed.

p4runtime API includes P4 Action info. If p4runtime is generated earlier, during front-end processing, before any Action changes are invoked by the compiler, p4runtime API generation may generate some wrong Action information. Let's

(a) keep using duplicate actions inside a control, (b) add new boolean flag to @name to indicate name is generated internally. The generated name can append a ".table_name" for duplicate actions. In any case, an action's table can always be found. (c) p4info generator checks if boolean flag is set on @name, finds duplicate tables, and generates appropriate actions.

hesingh commented 5 years ago

If the compiler generates duplicate actions, the compiler could store the actions in a linked list. The linked list could hang off of the @name.