p4lang / p4c

P4_16 reference compiler
https://p4.org/
Apache License 2.0
681 stars 445 forks source link

Add duplicate action control plane name check pass #4951

Closed jafingerhut closed 1 month ago

jafingerhut commented 1 month ago

I have updated this initial comment with the status of the PR after 32 commits.

I recommend that we abandon this PR, and instead focus on https://github.com/p4lang/p4c/pull/4975 instead. #4975 is intended to combine all of the fixes for #4951 and #4970. I thought #4970 might be merged quickly, and then #4951 would be after that, but there are some small test case dependencies between the two that it is nicer to do all at once. I will copy this initial big comment over to the first comment on #4975, as a summary of its status.

These changes are intended to address the problem where p4c currently allows a user to have identical @name annotations on multiple objects, e.g. on multiple actions. This can cause incorrect P4Info files to be generated, among other likely problems resulting from that root cause. See this slide deck for examples and more details:

The code changes here are based upon this PR, which stalled a while ago: https://github.com/p4lang/p4c/pull/3228

These changes are noticeably different from that PR, however, in that this one adds a new pass that occurs before the LocalizeAllActions pass, which is the pass where p4c by design can synthesize actions that are duplicates of user-written actions, all of which have identical @name annotations. Unless we change that behavior, any checks for duplicate @name annotations must be done before that pass. Such a change is described verbally in this comment on the PR above: https://github.com/p4lang/p4c/pull/3228#issuecomment-1111635352 but to my knowledge has not been implemented before. This PR is trying to implement that approach.

It passes all of the CI tests, after modifying a handful of P4 programs that had the error being checked for, so they no longer had the error. One of those was preserved to verify that p4c catches the error with the expected error messages.

The new pass is only enabled if one or more of the command line options are given to generate P4Info output files. The duplicate @name annotations checked for in the new pass only cause problems (that I am aware of) if P4Info output files are being generated. This is similar to some existing checks for duplicate names of P4 objects in the control plane serialization code for P4Info.

I think it would be best to avoid merging in this PR (even if the community finds it a good idea) until after the Tofino team has finished their work of publishing their code in open source. I would hate for an extra check like this to cause them an extra day or two of work in tweaking their automated tests.

There have been multiple related issues filed on p4c over several years that this PR might address, at least partially if not completely.

These issues I believe are fully fixed by this PR:

The issue below should remain open even if this PR is merged, as the issue is really asking questions about what names should be used for controllable P4 objects in a P4Info file. This PR does not change any names in P4Info files -- it only gives errors and prevents P4Info files from being generated when the current names would conflict:

I still need to re-read and think about whether the following issues are fixed by this PR, or whether there are other questions in them that are not fixed by this PR:

jafingerhut commented 1 month ago

@ChrisDodd Does it sound normal to you that local variable declarations within a parser or control have @name annotations generated for them within the IR? I do not know which pass they are created in, but it is somewhere before the LocalizeActions frontend pass. They are not printed when you use the --dump and --toP4 command line options, but the method DuplicateHierarchicalNameCheck::postorder added by this PR is called for such local variable declarations, and does not take the return statement in the first line of that method.

I do want this new check to be performed for extern instantiations and their names, since many of those have control plane APIs and thus should have unique global names. But if it is a local variable with a non-extern type, I cannot think of any reason it should be visible in a control plane API.

ChrisDodd commented 1 month ago

A higher-level question here -- does it never make sense for a user to use the same @name on different things? One situation that comes to mind is having slightly different actions in different tables that do the same logical thing -- it makes sense to give them the same name so that code that is adding the "same" action to multiple tables looks more consistent.

This is in line with what is going on with the compiler-generated names -- when the compiler is splitting a logical table into multiple tables, the user does not want to need to be aware of that and need to use different names for things as a result.

If it would make sense in some situations, maybe the duplicate names should be a warning rather than an error? The original concept for @name was to allow the user to specify a "cleaner" interface for the control plane to hide some complexity of the P4 program. Getting things wrong can lead to confusion, so it needs to be used with care.

ChrisDodd commented 1 month ago

@ChrisDodd Does it sound normal to you that local variable declarations within a parser or control have @name annotations generated for them within the IR? I do not know which pass they are created in, but it is somewhere before the LocalizeActions frontend pass. They are not printed when you use the --dump and --toP4 command line options, but the method DuplicateHierarchicalNameCheck::postorder added by this PR is called for such local variable declarations, and does not take the return statement in the first line of that method.

I do want this new check to be performed for extern instantiations and their names, since many of those have control plane APIs and thus should have unique global names. But if it is a local variable with a non-extern type, I cannot think of any reason it should be visible in a control plane API.

I'm not sure why non-extern local variables would get/need @name annotations as they are not generally visible to the control plane API. However, it might make sense to do that for debugging info purposes -- one could imagine a debugger being able to inspect whatever resource these locals are allocated to and want to be able to map it back to the original name in the source code -- particularly in the presence of control/parser inlining and other transformations.

jafingerhut commented 1 month ago

A higher-level question here -- does it never make sense for a user to use the same @name on different things? One situation that comes to mind is having slightly different actions in different tables that do the same logical thing -- it makes sense to give them the same name so that code that is adding the "same" action to multiple tables looks more consistent.

This is in line with what is going on with the compiler-generated names -- when the compiler is splitting a logical table into multiple tables, the user does not want to need to be aware of that and need to use different names for things as a result.

If it would make sense in some situations, maybe the duplicate names should be a warning rather than an error? The original concept for @name was to allow the user to specify a "cleaner" interface for the control plane to hide some complexity of the P4 program. Getting things wrong can lead to confusion, so it needs to be used with care.

The P4Runtime API is certainly not the only control plane API in existence, but taking it as one prominent example, it has built into its data model that every action name must be unique. If, beneath the covers of a network device's implementation of the P4Runtime API, it combines that action name with a table to decide what exact implementation to use for that particular table, that is an implementation detail for that target, and perfectly acceptable. It does not affect the P4Runtime API specification, nor the messages passed between client and server.

The P4Runtime API specification could be modified to support multiple actions with the same name, but I'm not aware of any desire by its users to change it in that way.

The P4Runtime API spec also requires that every table name must be unique. It actually permits a table and an action to have the same name as each other, and the p4c implementation currently allows this, and there is code in this PR to preserve that capability. I was reminded of this because there is one particular test program that exercises this possibility, named same_name_for_table_and_action.p4.

It might definitely be the case that some other control plane API does not have such restrictions. As of the current state of this PR, you can skip its new checks by skipping the new pass, via command line options --excludeFrontendPasses DuplicateHierarchicalNameCheck. I am certainly open to the idea of adding a new shorter command line option that achieves the same effect.

jafingerhut commented 1 month ago

Another idea: This new pass could be enabled only if P4Info files were being generated.

jafingerhut commented 1 month ago

One situation that comes to mind is having slightly different actions in different tables that do the same logical thing -- it makes sense to give them the same name so that code that is adding the "same" action to multiple tables looks more consistent.

So if the compiler takes one action written by the P4 developer, and creates two or more specialized versions of it, then modulo bugs in the compiler I can believe that these two specializations will have the same logical behavior according to the P4 language spec, and it makes perfect sense to use the same name for them from the control plane API. Note: This is exactly what happens today with the LocalizeAllActions pass - single actions are duplicated, and all have a copy of the @name annotation of the original. I have no problem preserving this behavior, and this PR does preserve that behavior. The new pass added in this PR is executed before the LocalizeAllActions pass, specifically because I was trying to preserve that behavior. This new error-checking pass should always be done before any pass that duplicates actions, tables, etc. and their @name annotations whenever it is used. See the block comment in file duplicateHierarchicalNameCheck.h of this PR to see if it says this clearly and explicitly enough. I am happy to change it if it is not clear enough.

However, what if someone writes a P4 program with two very different actions, and puts the same @name annotation on both of them? In that situation, it seems to me to make 0 sense to call those by the same name.

What if a developer creates two actions with the same @name annotation, but believes they both have the same logical behavior, and are interchangeable? It doesn't make sense to me to implement a check in p4c that verifies whether those two actions actually are logically equivalent. If the compiler does not implement such an equivalent-behavior check, and the actions are different, and p4c treats them as logically the same, I suppose you can call that a place where the P4 developer shot themselves in the foot. But is this a use case we believe is useful to support? If so, then I am pretty sure they cannot use the P4Runtime API, and must use some other control plane API (which is fine, but out of scope of this PR, I think).

As a concrete example, see actions with P4 program names a1 and a2 in this program, that both have the same @name annotation: https://github.com/jafingerhut/p4-guide/blob/master/name-annotations/actions-5-same-name-annot.p4

In my opinion, this program should be rejected. I cannot think of any reason why we should accept it if you want to generate P4Info from it.

Even if some future control plane API always uses (table name, action name) pairs to disambiguate actions, I would bet $100 that there is some bug in p4c in this area that would require changes to make that work. I say: if such a control plane API arises, let the team developing it enhance p4c to support it.

jafingerhut commented 1 month ago

FYI, since this change might cause issues for automated tests on the Tofino code base, I plan to hold off attempting to merge this code until after that team has published their work. The existing bugs this is trying to avoid have existed for 5 years -- we can wait a month or two longer. That and I would like to get careful review from anyone who might be affected by these changes, because the current behavior has been the way it is for so long. I am happy to try to write up some article explaining the reasons for the change, and why things are the way they are, if that is of interest for future reference.

jafingerhut commented 1 month ago

@fruffy Is there existing documentation on how to cause additional automated tests to run, that are not run on every pre-commit? I would like to try those out, too, since I'd like to be very cautious to catch any problems these changes might introduce. I'm happy to write such documentation and stick it into an appropriate file in this repo, if there is not yet one written.

fruffy commented 1 month ago

Are you referring to the skipped tests? These are triggered by adding labels to the pull request then pushing again. I am not sure whether we have documentation, it is possible we do.

asl commented 1 month ago

However, what if someone writes a P4 program with two very different actions, and puts the same @name annotation on both of them? In that situation, it seems to me to make 0 sense to call those by the same name.

Time for One Definition Rule in P4? :)

jafingerhut commented 1 month ago

Are you referring to the skipped tests? These are triggered by adding labels to the pull request then pushing again. I am not sure whether we have documentation, it is possible we do.

Yes. Is there a checked-in list with the label to CI-test correspondence?

fruffy commented 1 month ago

Yes. Is there a checked-in list with the label to CI-test correspondence?

I do not think so, but the labels are somewhat self-documenting: https://github.com/p4lang/p4c/issues/labels You can see them under the run-*. We could add some more documentation but not sure where yet.

jafingerhut commented 1 month ago

@ChrisDodd As of commit 24, I have modified this PR so that the new DuplicateHierarchicalNameCheck pass is only enabled if you give one or more of the command line options that cause P4Runtime API output files to be written. This makes these checks enabled at exactly the times that other duplicate @name annotation checks on tables, parser value sets, extern instances, etc. are enabled, too. With these changes, there are no errors if you simply run p4test myprog.p4 on a program with duplicate @name annotations, whether they are on actions or anywhere else.

kfcripps commented 1 month ago

https://github.com/p4lang/p4c/issues/4651 I have re-read this issue after commit 24 on this PR, and this PR should fully address it. I have invited kfcripps to review this PR. If this PR is merged, I believe issue 4651 should be closed, since the issue will be fixed.

@jafingerhut I have not looked at any of the PR's changes so sorry if the answer to my below questions would be obvious by looking at them.

Just to clarify, does this PR address both the "error" case mentioned in https://github.com/p4lang/p4c/issues/4651#issue-2278519073, as well as the "bug" case mentioned in https://github.com/p4lang/p4c/issues/4651#issuecomment-2093844358?

Also, does the PR add both of these cases (or equivalent cases) as tests to testdata/p4_16_errors/, as well as testdata/p4_16_samples/, respectively?

jafingerhut commented 1 month ago

4651

I have re-read this issue after commit 24 on this PR, and this PR should fully address it. I have invited kfcripps to review this PR. If this PR is merged, I believe issue 4651 should be closed, since the issue will be fixed.

@jafingerhut I have not looked at any of the PR's changes so sorry if the answer to my below questions would be obvious by looking at them.

Just to clarify, does this PR address both the "error" case mentioned in #4651 (comment),

[jafingerhut] Yes, this is a case where the user's P4 program is trying to specify something incorrect.

For a program like that, this PR:

There are several test cases added that cover variations on this, e.g. some with actions at top level, with and without name annotations, and some with actions within controls, with and without name annotations, because the implementation actually had some unique sub-bugs in some of those cases.

as well as the "bug" case mentioned in #4651 (comment)?

For that case, that is the behavior of pass LocalizeAllActions. The new pass this PR adds always runs before LocalizeAllActions, so that it only detects @name conflicts that the user specfies, not one that LocalizeAllActions creates. If I added such a check after LocalizeAllActions, then many, many useful and legal P4 programs with correct @name annotations would stop compiling and start giving errors, ones that the developer did not cause.

Aside: Without this PR's changes, or something like them, two actions with the same @name annotation might have been a mistake written by the P4 developer, or it might have been introduced by the LocalizeAllActions pass, or some mix of each. With something like this PR's changes, I think I can promise you that after LocalizeAllActions runs, if two actions have the same @name annotation, it is only because they were originally the same action in the original input P4 source code, but LocalizeAllActions duplicated it, because it was used in multiple tables, or a top-level action was used in multiple controls. That seems to me like a useful invariant to know and document.

If you do not want the behavior of pass LocalizeAllActions, then I would recommend you try omitting pass LocalizeAllActions in your downstream version of p4c. I have not tested what happens if you omit this pass, but there might be later passes that assume LocalizeAllActions has already run.

My understanding of the intent of LocalizeAllActions pass is that it intentionally makes multiple copies of actions in the IR, if those actions are used in multiple tables. This allows later compiler passes to optimize those copies independently of each other. If you omitted pass LocalizeAllActions, then even if that does not introduce any bugs due to assumptions made by later passes, you would be giving up on that opportunity for optimizing the same action used in multiple places of the original P4 program.

Also, does the PR add both of these cases (or equivalent cases) as tests to testdata/p4_16_errors/, as well as testdata/p4_16_samples/, respectively?

jafingerhut commented 1 month ago

@kfcripps Last night I did a couple of experiments to see what happens if I comment out LocalizeAllActions pass in p4c/frontend/p4/frontend.cpp, and in some of the tests I comment out more than just that pass. I ran a full p4c test suite, and counted % of passed tests by "category", e.g. tests whose names begin with "p4/" are tallied separately with those beginning with "bmv2/". Results can be seen here: https://docs.google.com/spreadsheets/d/18l6s997BhNfV375LDBq4SNTdM7dW4XvuaZi3abX6CwE/edit?usp=sharing

Note: I do not have a good understanding of the dependencies between the passes. I was just trying a quick experiment to see what happened.

jafingerhut commented 1 month ago

Closing this PR. Please review https://github.com/p4lang/p4c/pull/4975 instead, which replaces this one and fixes one more thing.