p4lang / p4-spec

Apache License 2.0
175 stars 80 forks source link

Modular match fields for tables. #1284

Open jonathan-dilorenzo opened 3 months ago

jonathan-dilorenzo commented 3 months ago

For P4 LDWG, I promised I'd produce a toy example to illustrate this need, but I found a real example of ours from a year or so ago was open-sourced (though it looks much worse now).

Some possible thoughts on how I might like it to look:

control acl_ingress(in my_icmp_key, ...) { table acl_ingress_table { key = { All the shared keys my_icmp_type : ternary @name("icmp_type") @id(19) @sai_field(SAI_ACL_TABLE_ATTR_FIELD_ICMP_TYPE); ... } } }

control middleblock_ingress(...) { bit<0> dummy_match_key; apply { ... acl_ingress.apply(dummy_match_key, ...); } } control fbr_ingress(...) { apply { ... acl_ingress.apply(headers.icmp.type, ...); } }

jafingerhut commented 2 months ago

@jonathan-dilorenzo I wrote some notes about my perspective on maintaining similar-but-not-identical collections of P4 programs here: https://github.com/jafingerhut/p4-guide/blob/master/program-variants/README.md

As a bonus, I figured out how to configure VScode to do "inactive region highlighting" for C/C++ programs based upon arbitrary settings of C preprocessor symbols that you specify in a config file. If you rename your P4 programs to have a file name suffix of .c, you can use this feature to highlight unused or "dead" code due to #if, #ifdef, and #ifndef as a different color. Detailed steps on how to do that are linked from that article.

vlstill commented 2 months ago

My take on these options:

  1. I don't like it, because the modification looks like runtime code, but clearly needs to be evaluated at compile time. Also, what if the modification is written after the call to apply?

  2. I don't like it, because it looks like runtime decision that would require essentially two copies of the table to exist. It also gives impression that the same table can be called multiple times with different arguments to apply, but in reality these would need to be separate tables (in any reasonable implementation I can see), which would really mess either the control-plane or reasoning about it.

  3. That seems like the most consistent extension of the current P4 syntax and abilities. It is a little bit more cumbersome though, but I guess one has to have the table in a control anyway to be able to de-duplicate it without use of the preprocessor. Some explicit void type might be better then bit<0> for omitted key. Type checking might be a bit tricky (note that the type of the key part that is different must be a type parameter, but it can't be unconstrained, e.g. int won't work there).


Since table are not really self-contained (their keys read control's scope and the action they call read and write variables from the control's scope), it probably does not make sense to allow table declarations to stand separate from a control. That is a little unfortunate, as otherwise some way of inheriting or composing tables might make sense for this.

As is, if we wanted to go in this direction [kind of brainstorming], we would either need to define some way of inheriting controls and then extending tables in them, or to make key sets separate entities that can be inherited/extended and that can be control's parameters (used in the table definition).

jafingerhut commented 2 months ago

Adding another possible syntax into the discussion, this one using a compile-time conditional "statement" which I've proposed naming static_if, so that it is named similarly to the current static_assert that is already in the language spec, and has been designed so it is strictly for use for compile-time known things only:

// Somewhere earlier, probably in some top level `.p4` source file that is included in the one below
enum Inst_t { MIDDLEBLOCK, TOR };

const Inst_t instantiation = Inst_t.MIDDLEBLOCK;

 table acl_ingress_table {
    key = {
       *All the shared keys*
        static_if (instantiation == Inst_t.MIDDLEBLOCK) {
            headers.arp.target_proto_addr : ternary @name("arp_tpa") @id(16);
            // other keys only present when instantiation is MIDDLEBLOCK go here
        }
        static_if (instantiation == Inst_t.MIDDLEBLOCK || instantiation == Inst_t.TOR) {
            // keys go here if they are only present when instantation is MIDDLEBLOCK or TOR
        }
    }
    actions = {
       *All the shared actions*
    }
  }

Yes, syntactically this is not much different than #ifdef and #endif, but it would be in the P4 language spec.

We would probably let static_if go around declarations of variables, constants, entire tables, individual actions in actions table property, individual table entries within the entries table property, entire parsers, entire controls, parser states, etc. It would be pretty pervasive in the language grammar.

There could also be a static_for compile-time loop that could appear in any of those places, too, but that still raises the question of how to create unique table, action, parser state, variable, and constant names, e.g. perhaps by somehow concatenating the decimal representation of an integer loop variable with some other partial identifier name.

vgurevich commented 2 months ago

Sorry for being late, but where can I find a clear problem statement? So far, all the discussions about "dynamic keys" that I've seen were easily resolvable by explicitly declaring two or more standard tables and letting the backend to place them into some shared resources. How is this specific case different?

jafingerhut commented 2 months ago

Jonathan can clarify in case I am misinterpreting anything here, but I think the briefest clear statement is that there are sometimes scenarios when you want to write multiple different P4 programs, but they have about 90% of their code the same, and differ in about 10%, including sometimes exactly what set of keys a particular named table has, or which actions a table has, or sometimes part of the control flow. The stuff that most developers would use the C preprocessor and #ifdef for today. Again, if I have not misinterpreted Jonathan, this article might give a little more background, but I suspect it is familiar to you from what I have said already: https://github.com/jafingerhut/p4-guide/blob/master/program-variants/README.md

vlstill commented 2 months ago

@jafingerhut, I think the static_if syntax you propose is quite peculiar in what it actually does (or maybe I don't understand it fully). Most importantly, while the static_if as you have written it is in the declaration of a table, the argument for evaluating it should be part of "instantiation" of the table (which is something we don't have now -- or rather have it implicitly as part of instantiation of a control). The if is only static in the instance not static in the declaration (where it is written). I guess this makes it very similar to the C++ template (!) use of if constexpr that is evaluated at template instantiation time. However, P4 object instantiations are much less capable than C++ template instantiations (mainly to make sure we can typecheck the declaration I guess, which C++ can't do for templates).

So I believe that what you propose would go in a direction of adding templates to P4 (especially if the content of the static_if (and possible static_for) is quite unconstrained). We would also need to define what these conditions can depend on quite carefully (note that constructor arguments are not local-compile-time-known values, so types currently can't depend on them).

If we want to have such an expressive/powerful capability, it is worth investigating if sanitary macros could be actually a better solution -- with a huge caveat I have no experience with them, so I could be missing important downsides (I have a lot of experience with C++ templates, and they are very powerful, but I thing there was quite an agreement we don't want to go in that direction for P4).


(feeling rebellious: Maybe if we go in this direction, we should start thinking about P426 that we design ground up with all these things in mind...)

vlstill commented 2 months ago

So far, all the discussions about "dynamic keys" that I've seen were easily resolvable by explicitly declaring two or more standard tables and letting the backend to place them into some shared resources. How is this specific case different?

My understanding was that this goes in the different direction -- completely independent tables that have similar structure.

jafingerhut commented 2 months ago

@vlstill In my example, the conditional expression in the static_if does not have anything (necessarily) to do with the declaration of the table. It is just some expression consisting only of compile-time known values (or perhaps it should be restricted to local compile-time known values -- I am not sure if that is important for implementation/specification of such a thing, or nto).

vgurevich commented 2 months ago

@jafingerhut -- thank you for the clarification. I am definitely familiar with the need and I am familiar with the doc you wrote. In fact, this is precisely what 99% of P4 developers do -- they use C preprocessor. Besides being a very familiar tool, it also has a very important additional benefit -- it allows the critical configuration files (all those #defines for the table sizes, names, widths, etc.) to be shared between the P4 and C/C++ code, thereby ensuring consistency.

I do certainly understand the limitations of the C preprocessor and I thought we had several discussions with respect to improving the preprocessing abilities of P4. Is that where we are going?

If so, I think it would still be nice to better understand the additional requirements/limitations before adding all these features directly into the language, such as:

  1. What is wrong with the other existing preprocessors (e.g. M4)?
  2. Should we keep metaprogramming facilities syntactically separate from the rest of the language?
    • For example, will an editor be required to parse the whole program just so it can highlight the pieces that are being compiled out
  3. Should we keep the metaprogramming facilities semantically separate from the rest of the language?
    • For example, can the same variable/const be used in both contexts?
  4. Given that P4 programs are almost never needed alone, but require the corresponding control plane code that must match the data plane code, do we want to figure out how to solve that problem? It is pretty bad today, but with more powerful facilities it will become even more intractable.
vlstill commented 2 months ago

@vlstill In my example, the conditional expression in the static_if does not have anything (necessarily) to do with the declaration of the table. It is just some expression consisting only of compile-time known values (or perhaps it should be restricted to local compile-time known values -- I am not sure if that is important for implementation/specification of such a thing, or nto).

@jafingerhut In that case, I don't really see how that would be used to be significantly better then use of preprocessor. You would need to duplicate this declaration multiple times, right? Possibly by preprocessor, but then it still uses preprocessor and is not much better then just having #ifdef in my opinion.

vlstill commented 2 months ago

Besides being a very familiar tool, it also has a very important additional benefit -- it allows the critical configuration files (all those #defines for the table sizes, names, widths, etc.) to be shared between the P4 and C/C++ code, thereby ensuring consistency.

That is a very good point in my opinion. If we design a solution that would be totally disconnected from CPP, we would need another way to share configuration between P4 and C/C++.

Although if the configuration can be expressed just in simple defines, then we are probably fine keeping that in P4.

In my opinion though, if we want to add modularity/parametrization to P4 it should be significantly better (cleaner, easier to work with) then using CPP. Otherwise the advantages @vgurevich had pointed out for CPP may be more important.

As for the points @vgurevich raised, my opinion is:

  1. I think that loses on the familiarity and tends to look even worse then CPP.
  2. I would slightly lean to a unified syntax, but I can see a point for separate one.
  3. It would be good to use constant in both in my opinion.
  4. I think this should be kept in mind if we design new metaprogramming facilities.
jafingerhut commented 2 months ago

@vlstill I am not claiming it is significantly better than using the preprocessor. I am trying to provide alternatives for thought here. Me, I don't mind the C preprocessor. Others seem to dislike it for various reasons.

Some differences between my static_if and static_for idea vs. the preprocessor:

table t1 {
      key = { .... }
     actions = { ... }
#ifdef ENABLE_T2
    default_action = ...
}
table t2 {
    key = { ... }
    actions = { ... }
#endif
    default_action = ...
}
vgurevich commented 2 months ago

@vlstill

It would be good to use constant in both in my opinion.

I would agree with you, but it is definitely not quite the case today and CPP provides several advantages over const. Even if we improve that, the question will remain: how complicated will be the tool that just preprocesses the program. I think it would be nice if it is a lot less complicated than the full compiler.

Let's assume we have a table definition today, e.g:

table t {
   key = { . . . }
   actions = { . . . }
   . . . 
   size = TABLE_T_SIZE;
}

Now, let's assume that the programmer's job is to fit the program (which is so far one of the most common reasons why people do not hardcode the size in the first place). Which definition would you prefer for TABLE_T_SIZE:

  1. Pure P4 const?
    const TABLE_T_SIZE = 1024;
  2. Pure CPP #define?
    #define TABLE_T_SIZE 1024
  3. CPP define with default
    #ifndef TABLE_T_SIZE
    #define TABLE_T_SIZE 1024
    #endif
  4. Combination of (3) and (1):
    
    #ifndef TABLE_T_SZ
    #define TABLE_T_SZ 1024
    #endif

const TABLE_T_SIZE = TABLE_T_SZ;


If you ask me, most of the time I use (3) or (4), because in this case I can control the value of the `TABLE_T_SIZE` (`TABLE_T_SZ`) from the compiler's command line (by adding `-DTABLE_T_SIZE=2048` or whatever). 

Also, for example, if you have the code such as:

control C(...)(index_size=INDEX_SIZE) { bit<(index_size)> index; . . . }


the **only way** to specify the value of the `index_size` parameter is via CPP-defined constant, since `const` values are currently not considered to be compile-time known. 

The point is not that I am advocating against `const`, for `#define` or whatever. I am not. The point is that it is important to consider various scenarios (many of them being already very typical) and see how convenient/inconvenient things will be.
vgurevich commented 2 months ago

@jafingerhut

Me, I don't mind the C preprocessor. Others seem to dislike it for various reasons.

Let's compile the list of those reasons, so that we can see whether a particular solution is better or worse.

It is in the language, vs. a "separate language". At least, some people claim that as a good thing.

Let's compile the list of these reasons too.

I am not against the idea, but I want to make sure we understand exactly what we are trying to achieve in the first place.

rcgoodfellow commented 2 months ago

Since the possibility of approaching this problem with a first-class macro system came up in a recent LDWG discussion, I thought I would provide some of my thoughts in that direction.

First-class macro systems often execute after the AST has been generated. This means macros are a part of the language syntax; they operate directly on the AST of a program and can only produce valid AST fragments. A second property of some first-class macro systems is that macros have an explicit and bounded scope with respect to the AST (sometimes referred to as hygienic macros), and it's clear when program source changes, what macros need to be executed and what part of the AST will potentially be transformed. Contrast this to preprocessor systems that execute before any sort of compilation takes place, have a scope that is defined by textual inclusion, and can change symbol definitions independent of any syntactic scoping rules via directives such as #define.

Some of the practical impacts of these differences are:

On the topic of having macro-like features of the language being syntactically distinct versus fully integrated, my (slight) preference would be for the former, as I think it makes reading and understanding code a bit easier. With the distinct syntax approach, it's obvious what's happening at compile time versus runtime.

vgurevich commented 1 month ago

@rcgoodfellow, I think these are great ideas, and I especially want to talk about language bindings (and API definitions in general).

I totally agree that common CPP definitions are a very small portion of what need to be happening. Having said that, your proposal calls for the first-class macros that somehow can process P4 AST, but then output something non-P4, since (at least today) P4 does not include any facilities for API definition and thus there is no AST to generate.

Thus, realistically, we can explore a different approach, to which you alluded in the third bullet and that is a formal definition of a tool, that can explore the AST and output anything in response to it. So, for example an API generator will consume the definition of the table, plus all the underlying stuff (actions, types, etc.) and, while using some templates create a definition in some language (e.g. P4info, although that would not be my preference), while substituting the relevant portions of the template with the name of the table, the names of the key fields, etc. This is, by the way, how the API generation was performed by the original compiler (AST was basically a set of well-defined Python objects).

But this takes us pretty far away from the original problem, which is having some common source out of which multiple different (but somehow related) data plane programs can be generated. I know, you might be speaking from Rust experience, but P4 is a much less powerful language with a lot of special cases. The main problem is probably that there are declarations and the code, and declarations are often using very specialized syntax. Most of the things there cannot be expressions (at least today) and things like if() statements are not expressions in P4. Also, declarations and there use are separated, meaning that you often need to generate code in 2 or more places.

I can see how attribute-based conditional compilation approach that is available in Rust can be introduced in P4, by defining something annotation-like that will work the same way as the conditional compilation cfg() attribute (even though it might be easier, I would prefer it not to be an annotation per se, because the good practice is not to have semantic-changing annotations at all).

rcgoodfellow commented 1 month ago

I was not suggesting that the macros generate non-P4 code. On the contrary, I think it's important that macros only take P4 AST elements as inputs and produce P4 AST elements as outputs. The point I was making about code generation was that if the compile-time programming facilities are an intrinsic part of the language rather than a preprocessor phase, then code generation frameworks would benefit from that as they could treat the AST as the interface to P4 program representation rather than source code directly.

Would love to have a conversation about language binding generation and API more broadly, as I think having standardized API definitions for P4 at a lower level than the P4 Runtime (thinking hardware/software interface) could have a lot of potential. But as you said, @vgurevich, that's starting to get far afield from the focus of this issue.

I'll work toward putting together a concrete example of what a macro-based approach might look like for the code @jonathan-dilorenzo linked above.

jafingerhut commented 1 month ago

I asked the question below live during the 2024-Aug language design work group meeting to Jonathan, and his reply was (a), i.e. that he was intending the source code to behave as K similar, but different, P4 programs at compile time, with K different "binaries" produced, and each network switch would load exactly one of those K binaries.

@jonathan-dilorenzo One short question on the code snippets in your original issue: Do you think of the conditions "[some_metadata_saying_that_this_is_a_middleblock] == true" as involving:

(a) compile-time configuration: only values that are compile-time known, and restricted to be compile-time known? (b) run-time configuratino: they can include run-time variables?

If (a), then it seems you are fundamentally describing K similar, but different, P4 programs, which you would execute a P4 compiler on K times to get K different "binaries". Each network switch is expected to load exactly one of those binaries. This way requires some way to invoke the compiler in K different ways, otherwise you do not get K different binaries. This approach is exactly what the C preprocessor and #ifdef's can be used to implement, although of course one can invent other ways (perhaps exactly the topic of this issue).

If (b), then perhaps you are thinking that you want to execute the P4 compiler exactly once on the source code, producing exactly one "binary", and all switches load this binary. In this scenario, each switch must be configured at run time to indicate it which of the K types of behavior it should implement while processing packets.

I think both approaches (a) and (b) are on the table for consideration, and perhaps there are other possibilities besides (a) and (b) that you or others have in mind. A program could mix both approaches (a) and (b), for example, but if they do, then they need to produce multiple binaries, and some or all of those binaries require configuration to choose among the run-time configuration options that a single binary implements.

I think in this discussion it is worth making it clear whether one is thinking of approaches to solve problem (a), or problem (b), or to define the restrictions and results of some alternate idea they are considering. For myself, I jumped to the assumption that the scenario we are talking about is (a), but I could be wrong in that is what you were looking for.

vlstill commented 3 weeks ago

I am thinking about one relatively lightweight extension to the current state of P4. It could allow both parametrization of single table use case (i.e. there is a P4 program that can be compile-time configured for different use cases that require different keys in table) as well as parametrization of reuse (i.e. multiple places in the same P4 code use similar table definitions).

The key point is that we could introduce a new match kind none that would indicate a key that should be skipped. This, in my opinion, is cleaner that pulling the key to bit<0>. Then we would need to allow expressions (compile-time evaluated of course) in the place of match kind value in tables (currently we only allow a single name there). Suddenly, we have ability to compile-time parametrize the keys.

Example:

  1 #include <core.p4>
  2 
  3 #ifdef FOO
  4 const bool foo_enabled = FOO;
  5 #else
  6 const bool foo_enabled = false;
  7 #endif
  8 
  9 match_kind { none } // TODO: name? TODO: add to spec
 10 
 11 header hdr_t {
 12     bit<8> a;
 13     bit<8> b;
 14     bit<8> none;
 15 }
 16 
 17 control SubCtrl(inout hdr_t hdr)(match_kind key1_kind) {
 18     action a0() {}
 19     action a1() { hdr.none = 4; }
 20 
 21     table t0 {
 22       key = {
 23         hdr.a : exact;
 24         hdr.b : (foo_enabled ? ternary : none); // TODO: no expressions allowed
 25       }
 26       actions = { a0; a1; }
 27     }
 28 
 29     table t1 {
 30       key = {
 31         hdr.a : lpm;
 32         hdr.b : key1_kind; // TODO: decl not found
 33       }
 34       actions = { a0; a1; }
 35     }
 36 
 37     apply {
 38         t0.apply();
 39         t1.apply();
 40     }
 41 }
 42 
 43 control cntr(inout hdr_t hdr) {
 44   SubCtrl(lpm) sub;
 45 
 46   apply {
 47     sub.apply(hdr);
 48   }
 49 }

This does not compile currently. The grammar does not permit expressions in place of match kind and P4C does not do lookups for the names there (or probably it looks in up only within the match_kind defintions. But all of these seem to be like rather straight-forward extensions.

The nice part is that inlining, constant folding and constant propagation are not needed to type check SubCtrl -- the type checker does not care what match kind a key is, it only needs to be able to figure out type of the expression and check that the match kind is of type match_kind. Both of these can be done without inlining, constant propagation and constant folding the match kind values and expressions. The key1_bind variable and foo_enabled flag don't even need to be local compile-time known, it is sufficient if they are compile time known.

A midend pass would later remove the match_kind none keys. That one would of course need to run after inlining, constant propagation and constant folding.

One important thing to figure out would be the name of the new match kind. I like none as it is relatively clear what it means in my opinion and it may be less likely to occur like a variable name than e.g. skip. Luckily this is not a keyword, one can still define variables of the same name. But they will/should shadow the match kind if at the control scope -- this would probably be one breaking change of this proposal (think about a local variable named exact in a control).


Of course, there are many other problems that would be solved my macro system and not solved by this (e.g. removing some actions from action list). But in my opinion this would be rather clean and would go significant way. It would also not hurt the macro-based extension in future, it would just give us more freedom in how tables can be defined.

jafingerhut commented 3 weeks ago

Given the question and answer in this comment: https://github.com/p4lang/p4-spec/issues/1284#issuecomment-2269633180

it leads to this minor (but important) follow-up question:

Since you want to produce K different binaries from the compiler, presumably you want to invoke the compiler K times, each producing a different binary. Thus the command line options for each of those K compiler invocations should be different, because ideally two invocations of the compiler with the same command line options and the same input files would produce the same binary.

(option 1) If you use the C preprocessor and #ifdef directives in the source code, this is typically done by using -D<symbol_name> or -D<symbol_name>=<value> options on the command line. Then, even if the P4 source file(s) given on the P4 compiler command line are the same, the C preprocessor can produce different output from the C preprocessor, and thus different inputs to the compiler (after preprocessing is done).

I could imagine introducing NEW command line options that let one assign values to P4 named constants, and then using the values of those constants in compile-time macro invocations in the P4 source code.

(option 2) Another way to do this is to have multiple "top level" P4 source files, e.g. variant1.p4, variant2.p4, variant3.p4, where each assigns values to some constants (or #define's some preprocessor symbols), and then does #include on a common P4 source file that uses those constants or preprocessor symbols to behave differently, depending upon their values.

If one of the goals is to completely avoid the use of the C preprocessor (that might not be your goal), then (option 2) does not seem to me to be an option available to you, becuase you cannot use #include.

(If it is NOT one of the goals to completely avoid the use of the C preprocessor, then I wonder idly aloud: You are OK with using #include, but you really, really want to avoid #ifdef?)

(option 3) Perhaps someone has some other approach in mind that I have not listed above?