What kinds of parser/control declarations with generic types are supported?

jafingerhut commented 4 years ago

The comments in the attached program give references to places in the latest P4_16 language specification that seem to indicate that the program is trying to do something declared illegal in the specification, e.g. it appears this should be illegal:

control ingressImpl<H2>(inout H2 hdr,
                    inout metadata_t meta,
                    inout standard_metadata_t stdmeta)
{
    // body of control here ...
}

but the version of p4c indicated below compiles this program with no error messages:

$ p4c --version
p4c 1.2.0 (SHA: 62fd40a0)

illegal-type-kinds.p4.txt

jafingerhut commented 4 years ago

FYI, before making changes to p4c to give errors for programs like this, it might be worth considering whether the spec could/should be changed to make some programs like this legal, in cases where we can figure out that it makes sense to do so, and might be useful for writing generic parsers and/or controls that can be written once, but work for many different values of type parameters, e.g. any bit<W> type.

jnfoster commented 4 years ago

This is a known deviation from the specification. P4c has always supported generic parsers and controls since P4_16 was released, as an experimental feature.

So I think this can be safely closed.

jafingerhut commented 4 years ago

Is there any documentation or description of what the limits of the support for generic parsers and controls are? @vgurevich said he has been trying some out, and getting errors for some, but not for others, but the rules aren't clear yet from those experiments.

jnfoster commented 4 years ago

I believe (@mbudiu-vmw can correct me if I'm wrong), the type checker is inspired by Hindley-Milner type inference. And there's nothing special about how generics are treated in parsers and controls.

Maybe @vgurevich could post some of the examples he's toying with to shed light?

jafingerhut commented 4 years ago

Attached are two similar example programs using the v1model architecture, the p4c version I tested with, and the error messages they give (only one of them gives an error).

I have not yet checked the resulting BMv2 JSON or output of the compiler mid-end pass to see whether the one that gives no error appears to be doing what it ought to, but suspect it probably is.

$ p4c --version
p4c 1.2.0 (SHA: 62fd40a0)

$ p4c --target bmv2 --arch v1model generic-used-as-table-key.p4 
generic-used-as-table-key.p4(82): [--Werror=type-error] error: Key field field type must be a scalar type; it cannot be T
        key = {field : exact;}
               ^^^^^

$ p4c --target bmv2 --arch v1model generic-used-as-hash-input.p4

generic-used-as-table-key.p4.txt

generic-used-as-hash-input.p4.txt

jafingerhut commented 4 years ago

In the example program generic-used-as-table-key.p4, the error message clearly indicates that a table key field must be a scalar type. However, what is confusing is that both instantiations of that control use bit<W> for that type, with different values of W, so at least those instantiations have a scalar type that is allowed as a table key field.

jnfoster commented 4 years ago

P4_16 implements generics in the style of Java (also known as parametric polymorphism) and not in the style of C++ templates. In particular, to type check a generic control, we need to know that it will be well-typed for all possible instantiations of the type parameters, not just the ones that are actually used in the program. See here for an old discussion of this point.

As a concrete example, consider the following code:

bit<32> f<T>(T t) {
  return t;
}
bit<32> x = f<bit<32>>(32w0);

With parametric polymorphism, the code should be rejected because the body of f is not well typed. However, with C++ templates, the code would be accepted because everything just happens to work out for the ways we invoke f.

Some people (myself included) feel that C++ templates are anti-modular because it's inherently a global analysis. Concretely, one can add code and cause previously well-typed code to fail. In the example above, if thousands of lines later I write the following:

bit<32> y = f<bool>(true);

Clearly something will break now. The bad thing about this design is that we don't discover the problem until the second invocation of f.

Now, why does supplying a value whose type is generic as a key to a table fail but supplying it to a hash function succeed? Presumably both only work on scalars and data structures formed out of scalars. In particular, we can't hash an extern object like a counter or a register. Well, the V1Model hash function is polymorphic:

extern void hash<O, T, D, M>(out O result, in HashAlgorithm algo, in T base, in D data, in M max);

and it does not check that the data D is scalars or data structures formed out of scalars.

For instance, I confirmed that this control, which hashes a register, also type checks,

control SimpleFieldHasher<FieldType>(
    out bit<32> hash_result,
    in FieldType field)
{
    register<bit<8>>(1) r;
    action do_hash() {
        hash(hash_result, HashAlgorithm.crc32, (bit<32>) 0, r, (bit<32>) 0xffff_ffff);
    }
    table calc_hash {
        actions = {do_hash;}
        const default_action = do_hash();
    }
    apply {
        calc_hash.apply();
    }
}

although I presume any real V1Model backend would reject it.

So probably the solution here is to tighten up the type checking rule for the V1Model hash primitive in a similar way as for table keys.

jafingerhut commented 4 years ago

Thanks for the details. I have not yet read the older discussion, but I will. It may already answer some of my questions below, in which case apologies for the repeats, and you are welcome to point out that any/all of them are answered there.

Is it fair to say that the word "modular" and "module" have multiple definitions, and perhaps even are measured along multiple 'axes', by different programming languages, language designers, and language users? If that isn't the case, and there is one widely accepted definition of what it means, I would be curious to know it. At least among P4 language designer attendees, it does not seem to be used by everyone to mean the same one thing.

Most/all P4 compilers today require global optimization/analysis. Is the that "Some people (myself included) feel that C++ templates are anti-modular because it's inherently a global analysis." perhaps a view of "modular" that stresses "separate independent compilation, followed by linking" should be something that is possible? Or is there some other meaning of "modular" that this is opposed to?

In your example call thousands of lines later:

bit<32> y = f<bool>(true);

"Clearly something will break now. The bad thing about this design is that we don't discover the problem until the second invocation of f."

Why is it a bad design if the compiler doesn't catch a problem until the quoted line of code is encountered?

The following sentences are a kind of reasoning by analogy, and I would not be surprised if the analogy is a poor one for reasons I am not seeing right now, so I will blurt it out anyway:

The compiler doesn't catch an occurrence of adding (using the + operator) a bit<32> to a bit<31> with no use of casts until you actually write such an expression. Does the + operation have a design problem because of this property?

jafingerhut commented 4 years ago

A shorter question about the current p4c implementation. You (Nate Foster) wrote this:

"In particular, to type check a generic control, we need to know that it will be well-typed for all possible instantiations of the type parameters, not just the ones that are actually used in the program."

What operations does P4_16 allow on all possible types? I don't even think assignment == and != are defined for all P4_16 types, are they? It seems no, if you include things like control type declarations as types.

Even if this check is restricted to types for which assignment, ==, and != are allowed, if those are the only operations allowed in such a generic parser/control on values with types that are type variables, that seems very limiting in what you can do with it.

jnfoster commented 4 years ago

Perhaps it would be clearer and more precise if I said "local reasoning" instead of "modular." With parametric polymorphism, you can type check a generic declaration without having to know how it will be used later.
I would say there actually are some fairly deep issues around operations like +! The way that I like to think about these operations is not that they are generic (because they aren't -- e.g., we can't + together two bools). But rather that they are overloaded at (or equivalently, an intersection of) a large number of types:+_{bit<1>}, +_{bit<2>}, ..., +_{int<1>}, +_{bit<2>},... During type checking, the compiler figures out which specific + operation to use depending on the arguments provided to a particular invocation.
Yes, generic controls and parsers are basically useless! That's exactly why we held them out of the P4_16 specification. We didn't delete the code in p4c because that would have been more work. The right way is to add some mechanism to constrain type variables so you can say things like "instantiate this type variable X with any numeric type" and so on. Then they become useful again.

jnfoster commented 4 years ago

Why is it a bad design if the compiler doesn't catch a problem until the quoted line of code is encountered?

Arguably the definition of f was silly

bit<32> f<T>(T t) {
  return t;
}

This code only makes sense if T is bit<32>. Anything else and it breaks. So allowing it to be accepted by the type checker is a time bomb waiting to happen. Suppose it makes it into a widely-used P4 library (if you'll suspend disbelief for a moment and imagine that P4 grows to the point where such things exist :-). It seems bad for the type checker to not catch the flaw in f...

jafingerhut commented 4 years ago

"Anything else and it breaks" means "Anything else and it is a compile-time error, hopefully with a reasonably understandable message from the compiler that you used a wrong type".

I will think a bit more about this, but in my first 30 minutes or so, the advantages of a C++-style type generics seem to outweigh the disadvantages. The reasoning a compiler must do with the current p4c implementation, with an imagined new add-on that is some sub-language to constrain what types can be used for a particular control/parser with generic types, is in two places:

(a) when the control/parser declaration is compiled and type checked (b) whenever an instantiation or call is made, the type parameters and run-time parameters must be type checked. The compiler doesn't give an error for those until it encounters them, either, I think.

The same two places appear to be where checking is done in a C++-style generic system, both (a) and (b). Maybe (b) for a C++-style system needs to "go down the instantiation DAG" on each instantiation, to re-check types, or something like that, is a difference?

vgurevich commented 4 years ago

First of all, many thanks to @jafingerhut who constructed v1model examples on my behalf. Also, many thanks to @jnfoster for the detailed explanations behind the current situation.

To provide some context, these specific cases grow out of the desire to use CPP less and P4 more and I think they do show that the ability to define a generic control is useful. Ultimately, it all comes to the general desire for some kind of a useful module system, where someone might write something useful that can be used by someone else without too much work. From that perspective, I think that the provided examples are useful.

We can certainly tighten the compiler checks in order to prevent the second program from compiling, since on the surface it seems to contradict what the spec says, but I think we can do better than that.

jafingerhut commented 4 years ago

Nate Foster wrote in this earlier comment https://github.com/p4lang/p4c/issues/2247#issuecomment-600835240

"This code only makes sense if T is bit<32>. Anything else and it breaks. So allowing it to be accepted by the type checker is a time bomb waiting to happen. Suppose it makes it into a widely-used P4 library (if you'll suspend disbelief for a moment and imagine that P4 grows to the point where such things exist :-). It seems bad for the type checker to not catch the flaw in f..."

It seems like a very benign time bomb to me. The compiler accepted it, sure. Someone tries to instantiate it months later with a type of bit<31> instead of bit<32>, and it fails to compile, right? It doesn't produce wrong object code that you can run. The compiler could presumably tell you "it didn't compile because I can't unify type bit<32> and bit<31>", with line numbers to where you invoked the function, and to where the function is defined. The developer now knows why their program doesn't compile, and can either change their code to use type bit<32>, or file an issue with the other developer (or themselves, if they wrote it), that the function should permit other types besides bit<32> to be used.

Sounds like normal "find a limitation, file an issue" kind of thing to me that happens all the time.

mihaibudiu commented 4 years ago

There is another very nice reason that I prefer Java-style generics to C++ templates: if you read the type declaration of a Java function (class, etc) then it tells you everything you need to know about how you can call it. If you call the function with correct arguments it will work.

In C++ you have to also read the function implementation to know whether you will be able to call the function.

The compiler will put these together indeed, but it's handier for humans to only read declarations.

In general, type checking in a strongly-typed language guarantees that if all pieces are well-typed they will fit together no matter how they are built and how they are implemented.

For P4 going forward we need some notion of type constraints as in C#/Java or type traits as in Rust. You can then say parser p<T> where T satisfies constraints. One constraint can be T has a + trait, which is an operator + : T x T -> T.

jafingerhut commented 4 years ago

"if you read the type declaration of a Java function (class, etc) then it tells you everything you need to know about how you can call it. If you call the function with correct arguments it will work."

Do you actually believe those two statements, without any further qualification?

I have a hard time believing that you actually do.

mihaibudiu commented 4 years ago

Yes, if the function does not use unsafe casts then this statement is true. It does not tell you what the function does, but it tells you all the legal ways to invoke it.

jafingerhut commented 4 years ago

To expand a little bit on my previous comment, suppose you have a Java function that does binary search on a sorted array, but is only guaranteed to work correctly if given a sorted array. The Java type system will never catch it if you pass it an unsorted array.

Examples of this kind of thing positively abound in all kinds of programming. I personally believe that type systems are fine for helping you catch the most basic kinds of mistakes, but none of them are anywhere near sophisticated enough to say "if you pass the type checker, the code is functionally correct".

mihaibudiu commented 4 years ago

Type checking in general does not guarantee full correctness. It just guarantees that the program never gets "stuck": no evaluation step will crash due to a type error (e.g., adding a number to a string). While this may seem a weak property, it does not hold in languages like Python or JavaScript.

jafingerhut commented 4 years ago

Right, passing the type checker means that there will be no type errors at run time (modulo weird corner cases in Java and Scala, at least, that aren't actually caught by Java's type system at compile time, which is apparently unsound, and are only caught at run time by the JVM, according to a paper that Nate Foster linked to in the discussion thread he linked to before, this one: http://io.livecode.ch/learn/namin/unsound)

The C++ generics also have this property, I think: if it passes the type checker, then type errors cannot occur at run time (or at least most cannot -- maybe C++'s type system has similar issues to Java's and Scala's that people haven't discovered).

I agree that C++'s type system means you either have to read the code, or read documentation provided by the developer, about what types it support and which it does not, for its type parameters. Such things are useful for what ranges of values it supports for run-time parameters, if it does not support all possible values with correct behavior (e.g. parameter a is expected to be in the range [1, 10]). Addendum: Or, you can just try using some type parameters and see whether the compiler gives an error or not. If not, it is type safe. If it does, you've got an error message explaining why that combination of types is not supported, with lines in the source code).

It seems to have the distinct advantage of being able to be implemented without having to design a new language for describing type constraints, and an algorithm for checking them.

vgurevich commented 4 years ago

I want to emphasize that P4 is very different from C++ or Java in the sense that there are no variables whose type would not be known until the runtime. Instead, everything is known at the compile-time.

As you can see, the functionality we discuss here can be easily implemented by coding the "generic" control accordingly and then running a preprocessor (be it CPP or something else). Everything is static.

I do not believe we will have data plane libraries that can be used by not looking at the code for quite a while. But we should at least try to create libraries that can be used by "a person skilled in the art". Even that is difficult, but here is a clear example of how a language feature which is almost there can help.

Let's try to avoid complicating things by doing things that are not needed in the language in the first place.

jnfoster commented 4 years ago

It seems to have the distinct advantage of being able to be implemented without having to design a new language for describing type constraints, and an algorithm for checking them.

Two questions:

Similar things can be done with the C preprocessor. Would you argue for implementing generics that way?
What happens when generics interact with other abstractions P4_16 already supports? For example, if I have declarations like this:
```
control C<T>(...) { ... }
control D<U>(...)(C<U> c) { ... }
```
which things can I validly pass as the c constructor parameter? Note that with C++ generics I don't get the assurance that the type variable U has been instantiated the same way. If the expansions of the body of D and C happen to use any values of type U in a compatible way in the monomorphic type system after generics are expanded, then the compiler will accept it.

This is an extreme version of Mihai's point that in Java "if you read the type declaration of a Java function (class, etc) then it tells you everything you need to know about how you can call it." As one starts stacking up abstractions like this, in this case the kind of functional abstraction that constructor parameters give us, the ability to encapsulate the reasoning you need to do to ensure that code will type check becomes more and more important.

Andy, you turned this comment into a strawman by implying that the "everything" meant full correctness. I don't think that's what he meant. The "everything" means, what's needed to invoke the function (class, etc.) in a way that the type system will accept.

jnfoster commented 4 years ago

Here is another strawman argument...

Would you be happy if we modified the P4_16 type checker so it still checked expressions and statements but did not check that the arguments supplied to controls, parsers, functions, etc. matched the declared types of the corresponding parameters?

For example:

header some_proto {
  bit<32> src;
  bit<32> dst;
}
control C(inout bool b, inout some_proto p) {
  apply {
     if(b) then { p.src = 0; p.dst = 0; }
  } 
}
...
C() myC;
// The following would all type check
myC.apply(false, -1);
myC.apply(false, 0xFFFF);
myC.apply(false, ternary);
myC.apply(false, PacketTooShort);
myC.apply(false, register<bit<8>>(1024))
// In fact, the following would type check for any e that 
// may be passed as a data-plane argument to a control
myC.apply(false, e);

Notably, the type system still has the property advocated for above: "if it passes the type checker, then type errors cannot occur at run time." But this design would be pretty surprising for programmers right? In particular, they can no longer stop worrying about type checking issues at function (control, parser, etc.) boundaries. Instead, they have to look inside the bodies of those components to know if things will work out or if there is a latent type error.

This is exactly what happens with C++-style templates. It's just that the behavior is at the type rather than the value level. It's possible that typical uses of generics are more limited, so breaking local reasoning is less problematic in practice. But it still seems like a pretty bad design to me.

vgurevich commented 4 years ago

@jnfoster -- Given that most things we do with higher-level constructs in P4 are all done at compile-time (applying of controls or parsers is inlined, structs/headers are flattened, etc.) most of this stuff can certainly be done by using any powerful enough text preprocessor. However, I thought that the goal was to eventually get rid of it, right?

For that same reason, I think that your proposal to do the type checking only after that "inlining" is quite reasonable.

I can also see that it would've been better if we could somehow communicate that the argument "p" to the control C should be either a struct or a header with the fields src and dst. But that can be added later -- even now I think the compiler can produce very reasonable error messages for all the cases you mentioned. Moreover, unlike some other languages, none of these errors will pop up at run-time.

ChrisDodd commented 4 years ago

I'll put in another vote for more C++ style rather than java style generics -- or at least more ml style (as we're (trying to) use H-M type inferencing).

Java generics only work because they specify the interface involved -- you use C<T implements I> where I is a concrete interface. C<T> is implicitly C<T extends Object>. Without an interface system, they become unusable. That's why the current p4c compiler only flags errors after instantiation and does not attempt to flag errors just on the definition -- doing so would make it impossible to define a generic that actually uses the generic argument.

Implementation wise, implementing generics with true type erasure (like java does) pretty much requires indirection and garbage collection -- things we're avoiding in P4. So we really want to keep implementing them statically (as type-safe macros, essentially). Since we do use H-M style inferencing, we could go more in the direction of ml (which, since we're trying to avoid garbage collection, leads us in the direction of rust).

To address Nate's straw-man:

Similar things can be done with the C preprocessor. Would you argue for implementing generics that way?

Yes, that would be fine, except the C preprocessor is a bit too weak (need something that can have a macro emit stuff into a diversion that can later be pulled in, like m4 does), and has a syntax that looks like function calls with ()-parens, which is already overloaded in P4 so would be even more confusing than the problems we already have control arguments.

mihaibudiu commented 4 years ago

I think that in the end we want some facility for meta-programming - e.g., to unroll loops, generate repeated code, etc. But we have to look carefully at other languages that do this successfully. A preprocessor-based implementation would be fine for that too. GC is not a problem really because we don't really have a "new/malloc".

jnfoster commented 4 years ago

That's why the current p4c compiler only flags errors after instantiation and does not attempt to flag errors just on the definition -- doing so would make it impossible to define a generic that actually uses the generic argument.

@ChrisDodd is this true?

If I try to define a generic function:

bit<8> f<T>(T t) {
  return t;
}

I get an error

error: return t: Cannot unify T to bit<8>

And if I try to do something similar in a control,

control C<T>(T t) {
  apply {
    bit<8> x = t;
  }
}

I get the same error:

error: x: Cannot unify T to bit<8>

Perhaps you mean some of the built-in externs like packet_ins extract method or V1Model's hash function? Those are declared with polymorphic types but actually only work when instantiated at certain types. There is special code in the type checker for extract. There is no special code for hash (as that's architecture specific) so that checking must be done by the backend.

vgurevich commented 4 years ago

@jnfoster -- I think what we see is that the generic definitions do work sometimes, but not always as evidenced by the program generic-used-as-table-key.p4.txt that @jafingerhut submitted.

Note that in that program the parameter with the variable type is used for the table matching, which is a very polymorphic operation to begin with (so there is very little that can go wrong type-unification-wise). That's why, even though the spec says otherwise, it really felt more like an omission in the compiler (plus the need to relax the spec) to me.

vgurevich commented 4 years ago

@mbudiu-vmw -- we absolutely want good meta-programming constructs in the language. I just want to emphasize that due to the fully static nature of P4 resource allocation, most of the problems that "other" languages need to solve should not even be present in P4.

That's why I believe we should be able to do many of these things a lot easier compared to general purpose programming languages.

jnfoster commented 4 years ago

@vgurevich responding to "generic definitions do work sometimes, but not always" (https://github.com/p4lang/p4c/issues/2247#issuecomment-601381468) I don't think the situation is inconsistent or mysterious. I provided a complete (I think?) explanation in (https://github.com/p4lang/p4c/issues/2247#issuecomment-600815095).

Let me try to recap here:

The front-end type checker implements parametric polymorphism -- i.e., generics à la Java not templates à la C++. There may be some bugs in the C++ code, but I believe that is the intention and at least for the examples I've tried, it's consistent with the behavior I see.
The front-end type checker yields an error when an expression with a generic type say T is used as the key of a table.
```
control C<T>() {
  T t;
  table tbl { 
    key = { t : exact; }
    actions = { ... }
  } 
  ...
} 
```
That is because, according to the type system, T could really be instantiated with any P4 type whatsoever! In particular, it could be a register, a non-serializable enum, string, etc. So tables are indeed quite "very polymorphic" but they are not completely polymorphic. Hence, the type checker requires each key to be a scalar. Because it does not know that T will necessarily be instantiated with a scalar, it correctly rejects the program.
Conversely, the front-end type checker does not yield an error when an expression with a generic type, say T is supplied as data for the hash primitive in V1Model. That is because hash is declared with a polymorphic type:
```
extern void hash<O, T, D, M>(out O result, in HashAlgorithm algo, in T base, in D data, in M max);
```
So when it checks if D can be instantiated with T (formally, if they can be unified), the answer is yes.

Now, do backends actually support invoking hash on arbitrary P4 types? Of course not! On Bmv2 and I assume most other backends, if you try to hash one of these unsupported types, you'll get an error. (Conversely, if you hash a scalar or other supported type, as in @jafingerhut's program above, things magically work and you don't see an error.) The error not an indication that something was broken in the P4_16 type system, but rather that the designers of V1Model and various backends told a little white lie about the type of hash, and then covered up for that lie in the backend.

mihaibudiu commented 4 years ago

@jnfoster is right; the correct declaration of hash should have some constraints on the type arguments, but since we do not have a way to write these constraints now the check is not done by the type-checker. Of course, the user does not care much who rejects the program, but it is a bit unpleasant that they will now know this by just reading v1model.p4 (without the comments); they actually have to run the compiler to get the error.

vgurevich commented 4 years ago

@jnfoster -- thanks for the explanation. It makes sense to me from what you are trying to achieve, but from the practical standpoint, as @mbudiu-vmw says: "since we do not have a way to write these constraints now the check is not done by the type-checker". So, perhaps we have two practical options when it comes down to table keys (at least):

Either do not do checking and let the backend worry about those, or
If we think we know the constraints on the table keys, then we can check for those, but only for specific instantiations of the control

The current system seems to reject all the code even if there is just a potential for it being used incorrectly, whereas it might be used correctly in all specific instances. That sounds like an overkill to me.

hesingh commented 4 years ago

If a table key is complex, the key is simplified by a midend pass. Thus, it's not cut and dry that only a backend should worry about a table key,

jafingerhut commented 4 years ago

Caveat: I am likely near the least knowledgeable of type systems in this group, so may very well advocate things that have well known problems. I may be educable here, but do not plan to gain a researcher's level of knowledge in type theory.

I do think it is easy to overstate the advantages of type systems, and avoid mentioning disadvantages, which is what I was reacting to when Mihai made a statement that at least could be interpreted as overselling the Java style type system. I was trying not to create a straw man, but to make sure what Mihai meant (but did poorly in that attempt -- my apologies for any offense created). As I expected, he meant "passing the type checker means no type errors at run time" (which as Nate Foster linked, is not true in all cases for Java or Scala).

It seems fair to add to a list of disadvantages of a C++-style type system: it does not explicitly represent, in a short expression, what types can be used for its type parameters. It is represented in the code explicitly, but is more "spread out" throughout the code, wherever that type or values of that type are used. For small controls/parsers that fit on a page of code, and do not instantiate other controls/parsers with generics, it can't be spread out very much.

It also seems fair to add to a list of advantages of a C++-style type system over a Java-style type system the following (corrections welcome): Any restriction on the subset of types that can be represented by using that type throughout the code, e.g. values of the type are used as a table key, or an action parameter, or in arithmetic expressions, etc. are all represented explicitly in code that most P4 developers already understand, e.g. "if a value X of type T that is a type parameter of generic control C is used in an expression 'X+3', then type T must be a type that supports addition with an infinite precision integer, so at least bit<W> or int<W> are allowed, and perhaps others".

If we use a Java-style generic system with new syntactic expressions to specify sets of types, should we provide ways to specify some or all of the following in that syntax?

any type that can be used as a table key field
any type that can be used as an action parameter
any type that can be added to an infinite-precision integer
any struct type that has a field named 'f1'
the intersection or union of any other pair of expressions that represent a set of types

With C++-style generics, it seems to me that any of the above, except perhaps "union", can be pretty naturally represented by using values of that type in the appropriate ways in P4 code. There is no need to write an algorithm that determines what that set of types is -- just let someone try to instantiate a control/parser with a particular type, and the compiler can determine whether it is legal everywhere values of that type appear.

We should recognize the role of good old documentation, too -- yes, even though it isn't parsable by the compiler, even though it can be out of date -- for saying things like "for this generic control, it only supports type parameters T that are bit<W> or int<W>. Any other type is not supported, even if it happens to compile without type errors". Just as we do for v1model externs with type parameters today (I think all of them are explicitly documented now): https://github.com/p4lang/p4c/blob/master/p4include/v1model.p4

vgurevich commented 4 years ago

@jafingerhut -- I am with you, except the last paragraph. I think you might've meant something different, e.g. "... any other type is not supported even if the frontend accepts the program" or something of that nature, meaning that it is OK to have a program that frontend accepts, but the backend rejects (including for the reason of a non-supported type being used in a wrong place).

I definitely think that if someone passed something in a generic control that can't be implemented, the program should be rejected by the compiler.

jafingerhut commented 4 years ago

@vgurevich What I tried to convey in my last paragraph is that it might be possible using C++ style generic controls, where someone writes a control, then later someone tries to instantiate it with type parameter T equal to an enum type named Bar_t, and everything compiles just fine, front end and back end.

However, the developer of the control perhaps never considered that possibility, and never tested it using that type for type parameter T. Maybe it just happens to work, but maybe it has some bug when you try to use it that way. If the documentation for the control said "I have tested this with T equal to bit<W> or int<W> for any W >= 8, and that is the only intended use of this control", the compiler would not enforce that.

vgurevich commented 4 years ago

@jafingerhut -- I see. This is a somewhat different aspect of modularity that we should probably not worry about here. It can always happen that certain code, once stretched to its limits, will not behave as the designer wants it too. That's the problem of the designer, not the language or the compiler.

We only need to ensure that if the compiler accepts the code, it behaves according to the spec (and if the compiler can't assure that for some other code, that code is supposed to be rejected) (and that the language empowers the compiler to do just that).

jafingerhut commented 4 years ago

@jnfoster asked a couple of questions above that I have not responded to yet.

"Similar things can be done with the C preprocessor. Would you argue for implementing generics that way?"

I do not know all the tradeoffs of using the GNU C preprocessor, but it seems that a custom P4-specific implementation of something that treated controls/parsers with generic types as "templates" that are expanded when instantiated, replacing all type variables with the type used in the instantiation, similar to a macro like #define FOO(a, b) ((a)-(b)), is something that could be made to work.

"What happens when generics interact with other abstractions P4_16 already supports? For example, if I have declarations like this:

control C<T>(...) { ... }
control D<U>(...)(C<U> c) { ... }

which things can I validly pass as the c constructor parameter? Note that with C++ generics I don't get the assurance that the type variable U has been instantiated the same way. If the expansions of the body of D and C happen to use any values of type U in a compatible way in the monomorphic type system after generics are expanded, then the compiler will accept it."

I think you mean a situation where first you create an instance of control C, say with a type like bit<16>, and then afterwards you create an instance of control D with a type like bit<32>, yes?

It seems that the declaration of control D above wishes to express the constraint that the type U supplied to any instantiation to D is supposed to be supplies with an instance of control C that was instantiated with the same type.

Documentation for control D could say that, too, e.g. "Control D should only be passed instances of C with the same type parameter as each other. Anything else has not been tested, is not supported, and you get what you ask for, whether it does anything useful or not is on your head."

Note that the documentation could also express far more general constraints, e.g. "Control D should only be passed instances of C where the type passed to C is int<W> for some width W >= 8 that is a multiple of 8, and then D must be passed the type bit<W> for that same width W." Or any other more general relationship anyone might wish to use between the two types. Using two different type variables U and V tells you nothing except that they do not have to be the same.

jafingerhut commented 4 years ago

Another note on the previous comment: Even if C++ style generic types do not enforce two types to be the same by itself (in the example code snippet with a control D taking a type parameter U, and a control that also has a type parameter U), all it takes is a single assignment between variables of two "different" types to enforce the constraint that they must be the same (or the language has implicit casts between them, which is a pretty short list right now).

liujed-intel commented 4 years ago

Wading into the debate, I'd like to throw in a vote for parametric polymorphism.

A consideration that hasn't been brought up yet is one of code evolution. There can quite easily be a gap between what an implementation does with a type variable and what the implementer has in mind for how the type variable can be instantiated. A user can read the implementation or use the compiler to guess at what the implementer intended, but what happens to work now may not continue to work as the library evolves.

As Andy has pointed out, documentation is one way of addressing this. I would view parametric polymorphism as a language's way of instilling good design habits in its programmers. When you write a generic function/control/etc., you should be thinking about what can inhabit your type variables. Type constraints are a form of mechanized documentation: by writing one down, you are committing to supporting that set of types. When you evolve your code, having to change a type constraint should make you think carefully about what you are doing.

Sure, types can't capture all concepts that a well-written comment can capture, but I don't think that means we should just throw up our hands. Very few developers are disciplined enough to document their code; in those cases, having a type constraint is better than nothing.

jafingerhut commented 4 years ago

@liujed-intel I think it would possibly help to advance the conversation, to be explicit in proposing the kinds of constraints you think ought to exist in such a type system. For example, this list from my earlier comment, that can easily be expressed in a C++ style generic system, in existing P4 code:

any type that can be used as a table key field
any type that can be used as an action parameter
any type that can be added to an infinite-precision integer
any struct type that has a field named 'f1'
the intersection or union of any other pair of expressions that represent a set of types

If there is some way to say "I want a parametric type system" without giving a language for constraining the types, then please educate me. If we need one, then saying we should have one, but not saying what it looks like and is capable of, is where we have been for years now, it seems.

Note: If there is such a language proposed somewhere for specifying type constraints, then I am unaware of it, and linking to it from here seems like an excellent idea.

liujed-intel commented 4 years ago

Sorry, I was unaware that we were looking for concrete designs. My understanding from reading this discussion was that it was simply debating the merits of parametric polymorphism vs. templates. Support for parametric polymorphism seemed to be in the minority, so it appeared premature to do any kind of design work to that end.

This said, if there is enough support behind parametric polymorphism, I'd be happy to help contribute to design work, but I don't have enough bandwidth to do it entirely on my own. As a starting point, I'd look to C#, Java, Rust, and Scala for inspiration.

jafingerhut commented 4 years ago

I am personally trying to move the debate forward to some kind of resolution, that leads to some approach being implemented, if possible. If we have X% of people wanting some kind of parametric polymorphism, and (100-X)% in favor of something like C++ generics, and the discussion ends there, then what? I suspect the answer is likely to be "N more years of neither".

I for one would chalk that up as a potential advantage in the C++ generics approach, personally -- less time from a group of time-starved people required to design how it works, and know what that design will do, and not do.

jnfoster commented 4 years ago

Maybe we should have a quick call. I feel like the conversation is going in circles.

Here are a few less technical points to consider:

C++-style templates are more permissive than parametric polymorphism. So if we adopt a semantics based on templates, it will be harder to revisit that design in the future without breaking existing code. In contrast, moving from polymorphism to templates is fully backward-compatible.
Many months ago, there was quite a bit of discussion around a design for constraints on types similar to the traits that can be found in languages like Scala and Rust. The effort was de-prioritized not because the technical challenges seemed insurmountable but because there wasn't anyone generating practical examples or agitating for it (in particular, the main person pushing for it at Barefoot, Milad Sharif, joined another hardware startup). If this is a priority (and perhaps Vladimir's examples indicate it is) we can certainly pick it back up. There is no need to declare defeat and abandon that work.
It's no obvious to me that a template-style implementation would be easier to implement. We currently have a type checker based on parametric polymorphism and Hindley-Milner-style inference. Once we have a design for traits, there are two changes: assume that the type variable has the capabilities of the trait in the body of the generic control (or parser), and check that the type supplied at an instantiation has those capabilities. For templates, we'd need to arrange things so that all type variables are substitute before the body generic control (or parser) is type checked, which is not rocket science, but a non-trivial change from the current scheme.
It'd be nice to fix the type of functions like hash, extract, and emit so they accurately reflect the constraints on the types the are used with and are not "lies." In particular, a statement like pkt.extract(x) where x is not a header would fail in the typechecker rather than the backend. To do this, we need a trait HeaderType so we can declare packet_in as follows:
```
extern packet_in { 
   ...
   void extract<T is HeaderType>(out T hdr);
    ...
}
```
Note that by themselves, generics à la C++ templates won't help us give an accurate type to such functions.
How would type inference work with templates?

jafingerhut commented 4 years ago

Nate, I greatly appreciate the detailed comment. It is entirely possible I have heard of the work by Milad Sharif, but if so I had forgotten about it.

Is any of that published anywhere, or is it early enough that it isn't felt to be in publishable form? I mean publish as in make public, whether or not it is submitted for a peer-reviewed publication, and whether or not it is considered complete.

jnfoster commented 4 years ago

Those ideas were certainly preliminary. Some notes are on the old P4 Modularity Wiki.

jafingerhut commented 4 years ago

I am guessing that this is the old P4 Modularity Wiki you are referring to? If so, haven't read through it before, but may try to do so: https://github.com/jnfoster/p4-modularity/wiki

jnfoster commented 4 years ago

Yep. Also happy to talk by phone...

vgurevich commented 4 years ago

I'll be also more than happy to talk. We can do Zoom -- that's probably easier.

I just wanted to provide a very simple example... As it was mentioned countless times on the mailing list when someone has trouble with their program, we recommend people to create a table that matches on the certain field(s) and does nothing, then insert it into the right place in the control so that BMv2 would display the values.

I would guess, that a fairly natural way to do that would be:

control data_display<T>(T data) 
{
    table t { 
        key = { data : exact; }
        actions = { NoAction; }
    }
    apply {
        t.apply();
   }
}

and then, of course, everyone should be able to instantiate as many different controls of the type data_display<XXX> as they need (the ability to use structs, headers and other derived types as keys that I previously mentioned in https://github.com/p4lang/p4-spec/issues/829 wouldn't hurt either ;) )

vgurevich commented 4 years ago

Anyone wants to talk, e.g. on March 31st at 1pm PDT? I can setup a meeting

p4lang / p4c

What kinds of parser/control declarations with generic types are supported? #2247