Closed MollyDream closed 1 year ago
@MollyDream I am all for examples clarifying some of these subtleties, so thanks for your write-up.
I agree that when I wrote the original text of this, I was not explicitly addressing what constitutes a "read" of a variable, and copy-in and copy-out were not explicitly addressed, and it is worth considering whether either or both of those are a "read" or not.
I was curious if you found anything in the current spec that would cause different behavior relative to uninitialized values for copy-in vs. copy-out? Or perhaps your choice above is based upon consideration of the kinds of transformations that a correct P4 compiler can make to the original program?
In general, I don't strongly support this, but just for information. The reason in favor of treating copy-out (and maybe also copy-in sometimes) as non-read is that it makes function inlining more efficient. For example,
void f(out bit<8> x, out bit<8> y) {
y = 0;
}
{
bit<8> x, y; // x is uninitilized
f(x, y);
}
If copy-out is not read, we can safely transform it into
{
bit<8> x, y; // x is uninitilized
y = 0;
}
If copy-out is read, it's not safe to do it because
{
bit<8> x, y; // x is uninitilized
y = 0; // x is still uninitilized
}
During 2022-Jan-03 LDWG meeting, Mihai & Andy came to the realization that perhaps p4c's LocalCopyPropagation pass can result in output programs that are more nondeterministic than the input program, unless the compiler can somehow restrict the use of this transformation to cases where the variables/fields being accessed in the expression are "at least as deterministic as" the original program, e.g. all of them have initialized values.
AI Andy: Try to find a small P4_16 program where the current p4c implementation's LocalCopyPropagation pass produces a program that is strictly more non-deterministic than the input program, and add it here. My guess is that finding such a program will be straightforward, e.g. one with an expression like (a-a)
or (a^a)
where a
is initialized in the user's P4 program and thus these expressions are defined to deterministically evaluate to 0 in the input program, but replaced with what look like equal expressions but containing uninitialized variables only, so they do not deterministically evaluate to 0 in the compiler-transformed program. The harder question is whether there are practical programs that have desired behavior in their user-written form, but buggy behavior in their compiler-transformed form.
In general, I am fine with any definition of the P4 language spec here for which we know which compiler transformations are correct (i.e. result is at least as determinstic as the input program), and which are incorrect, and the incorrect ones are optimization passes we are willing to live without.
For anyone curious, there is a very short P4 program for which today's p4c can increase the number of possible behaviors of an input P4 program, because of the LocalCopyPropagation pass, when at least some of the variables in the resulting expressions are uninitialized: https://github.com/jafingerhut/p4-guide/tree/master/nondeterminism#demonstration-of-localcopypropagation-increasing-nondeterminism-of-input-program
Filed this issue for p4c with a potential idea for preventing this: https://github.com/p4lang/p4c/issues/3022
It seems to me that the core of the questions here are not when reads occur, but when a variable is considered to be initialized, vs. uninitialized.
It seems to make intuitive sense that assigning a value to a variable X should make X become initialized.
So open questions raise by this issue are:
(a) should a direction 'in' parameter always become initialized when its action/function/control/parser is invoked? Or should there be some cases where it can remain uninitialized?
The text of Example 3 in the original issue suggests that the answer should be that a direction in
parameter always becomes initialized. I do not know if that is the best/right answer, but wanted to explicitly state the assumption being made in that example. To my knowledge, the current spec does not answer this question explicitly.
(b) when an action/function/control/parser is invoked as foo(a), and the parameter corresponding to a
is declared with direction out
, should a
always become initialized by the call foo(a)
? Or should there be some cases where it can remain uninitialized?
The text of Example 4 in the original issue suggests that the answer is that a direction out
parameter a
at the call site does NOT necessarily become initialized after the call foo(a)
. Again, I do not know if that is the best/right answer, and again, I do not think that the current spec answers this question explicitly.
(c) Both of the questions above arise for direction inout
parameters.
As I mentioned in the LDWG meeting on 2021-Jan-04, I suspect that different choices to the answers to question (a), (b), and (c) may affect which compiler optimizations are correct (i.e. never make the result after the transformation more nondeterministic), and which are incorrect (i.e. can make the result more nondeterminstic). I don't know which transformations are affected by the answers to (a), (b), and (c) yet, let alone how these answers affect them.
Here is an article about uninitialized values from someone who wrote a Ph.D. thesis on semantics: https://www.ralfj.de/blog/2019/07/14/uninit.html
Thanks for that link. It looks highly relevant to P4's current definition of the behavior of uninitialized variables, from what I can tell. I haven't read all the linked discussions yet, as I only spent a couple of minutes on the directly linked article. From that alone, I get the impression that the author has thought about these issues, but I do not yet know if they have recommendations that help us answer the question posed by this issue.
The main point of his text is that we (the LDWG) should define what "undefined" means, and not specific implementations, like Tofino or BMv2. The abstract machine for the language definition, aka the spec, has choices to make, and we make them. We can choose whether a read of an undefined variable "freezes" the value or does not. Both are legitimate choices.
Ralf was almost a P4 h4x0r! (I was this close to hiring him as a postdoc.) Sadly, he decided to go to some place called MIT.
"The main point of his text is that we (the LDWG) should define what "undefined" means, and not specific implementations, like Tofino or BMv2." I agree with this statement, and believed it before I read the linked article. I still do not know the best way to answer the questions raised by this issue, though :-(
There may be no "best" answer. This is why language design is hard. As you see, other communities struggle with the same problems.
Just recall the reason that we need partially initialized value by bit. Consider we assigning x
by parts:
void f(in bit<4> y, in bit<4> z) {
bit<8> x;
x[3:0] = y;
x[7:4] = z;
...
}
The only intuitive interpretation of x
after two partial assignments is x = y ++ z
. We must have bitwise value to support it.
In the interest of tidying up the set of active issues on the P4 specification repository, I'm marking this as "stalled" and closing it. Of course, we can always re-open it in the future if there is interest in resurrecting it.
The spec talks about uninitialized values in section 8.22, but given the amount of information, this section is very hard to follow and easy to misinterpret. I think adding more examples will clarify "a consistent value [does not have to be] returned across multiple such [reads]" of any uninitialized value, and will clarify what a "read" actually means. Example 1:
The “read” of the uninitialized variable
x
occurs inside the expression evaluationx+1
, taking an unspecified 8-bit value for the subexpressionx
. The value ofx+1
initializesy
deterministically, so the following two reads ofy
gives the same value, leading tor = true
. Example 2:The “read” of the uninitialized variable
x
occurs twice inside the expression evaluationx == x
, each possibly resulting in a different value, sor
can be eithertrue
orfalse
. Example 3:Copy-in of an uninitialized variable to an
in
parameter of a function counts as a "read", resulting in a deterministic unspecified value ofx
in the functionf
. Therefore,y = true
. Example 4:Since
x
is an uninitializedout
parameter inside the functionf
,r1
can be eithertrue
orfalse
. After executingf
, copy-out of uninitialized variables **does not*** count as a read, soa
becomes uninitialized, andr2
can be eithertrue
orfalse
.* This is based on our understanding of the spec. We think this design choice of "copy-in is a read but copy-out not" makes sense since it makes the function inlining smooth. Specifically, when the compiler inlines a function call, all the
in
parameters just have to be replaced by the arguments, and all theout
arguments be re-declared; this pertains the behavior of uninitialized parameters in function inlining.