p4lang / p4-spec

Apache License 2.0
175 stars 78 forks source link

Should run-time variables that a sophisticated compiler can prove must be equal to a compile-time known value, be treated as a compile-time known value? #1291

Open jafingerhut opened 6 days ago

jafingerhut commented 6 days ago

During the 2024-Jul-01 language design work group meeting, Jonathan DiLorenzo was asking some questions that I believe might be effectively the same as those described below (@jonathan-dilorenzo feel free to reply if I have misrepresented your questions).

There are P4 programs where it is possible to prove that at a particular point in the execution flow, the value of a run-time variable must be equal to a particular compile-time known value. For example:

control c1 ( /* parameters omitted */ ) {
    bit<8> i;
    bit<8> n;
    apply {
        i = 0;
        // at this point during the execution, the value of i must be 0, a compile-time known value.
        // Thus we could imagine that a sufficiently sophisticated P4 compiler could allow the following assignment,
        // which is _not_ defined by the current language spec:
        n[i:i] = 1;
        i = i + 1;
        // this is another point that we can prove that i must be equal to a compile-time known value, 1
        n[i:i] = 0;    // not supported by current spec, but we are imagining in this example that it is supported
        i = hdr.ethernet.srcAddr[47:40];
        // In this example, we are supposing that hdr.ethernet.srcAddr[47:40] is a field value from a packet
        // being processed, and thus there is no way that even an arbitrarily advanced compiler could know the value
        // of i here, and thus i cannot be treated as a compile-time known value at this point in the execution.
        if (hdr.ethernet.dstAddr == 0) {
            i = 17;
            // an advanced compiler could prove that i must equal 17 here, a compile-time known value
        } else {
            i = 18;
            // an advanced compiler could prove that i must equal 18 here, a compile-time known value
        }
        // Assuming hdr.ethernet.dstAddr is from the packet being processed, no compiler can possibly
        // prove that i is a single compile-time known value here.  It could prove that it must be 17 or 18, but
        // not a single compile-time known value, thus the following line should give a compile-time error,
        // assuming that bit slice indexes must be compile-time known values:
        n[i:i] = 1;
    }
}

I am not attempting here to enumerate all of the conditions for a run-time variable to be provably equal to a single compile-time known value at an arbitrary execution point within a P4 program, only to give some simple examples where it looks easy to do so, and some where it seems impossible to do so.

The discussion during the 2024-Jul-01 LDWG meeting seemed to be related to the questions raised in this issue. Even though it was during a discussion on loops in P4, I believe the discussion was mostly around what a sufficiently advanced compiler might be able to infer about the P4 program after loops had been unrolled.

jnfoster commented 5 days ago

I think compile-time-known-value-ness should be a property of P4's type system. Although it's true that we could, in principle, determine the value of many expressions at compile time, using various complex static analyses, it would be unusual to see this done in a language design. So I fear it would be confusing for programmers to go down this route.

In particular, if compile-time-known-value-ness is computed from type information alone, then it will be robust to various type-preserving refactorings the programmer might perform on their code. For example, in this program,

bit<8> i = 0;
i = i + 1;

the value of i is not compile-time known because local variables are not considered compile-time known.

However, the same might not be true under proposals hinted at here, using complex static analyses. For example, using a static analysis we can certainly determine the value of i at the end of the code block.

But now suppose we add a statement between the declaration of i and the increment:

bit<8> i = 0;
i = i | hdrs.ethernet.srcMac[0:7];
i = i + 1;

Now the value certainly cannot be determined at compile-time, as it depends on packet data. Or, as another example, suppose we invoke an extern with an out or inout parameter, so that it overwrites the value of i.

bit<8> i = 0;
some_hash(i, ...);
i = i + 1;

Even if the backend understood the semantics of the extern, the front-end would not. So again the value of i could not be determined at compile time, which is what we need to do to type check the program.

The distinction between a static analysis and a type system might be subtle. In general, type systems are compositional whereas a static analysis need not be. In this case, the only reasonable static analyses would need to be non-compositional to determine the values of expressions at compile time. This illustrates another argument for not going down this path is that the static analysis itself would take time to implement and -- assuming it were not perfectly precise -- would be confusing for programmers to understand and reason about.

To put it another way: compile-time known value != value determined at compile-time.

vlstill commented 2 days ago

I think compile-time-known-value-ness should be a property of P4's type system. Although it's true that we could, in principle, determine the value of many expressions at compile time, using various complex static analyses, it would be unusual to see this done in a language design. So I fear it would be confusing for programmers to go down this route.

I agree. Also, with sufficiently advanced static analysis it would quickly become quite hard to define in the specification where it should be able to derive that something is compile-time known. Especially since any such static analysis would almost surely have to be heuristic, just imagine a program like this:

bit<1> i = (bit<1>)((hdr.x.a || !hdr.x.b || hdr.x.c) && (hdr.y.a || !hdr.x.a || ...) ...)
x = hdr.v.v1[i:i];

Here, i is compile time know if and only if the expression that defines it is a tautology (i = 1) or is not satisfiable (i = 0), otherwise it depends on the packet. Sure, a sufficiently complex static analysis (capable of sat-solving) would be able to detect that, but one would not want to implement and to wait for SAT solver when compiling :-).