Closed vgurevich closed 10 months ago
The best way to express writing/reading from a field of such type is to use _, e.g:
Why not ... (aka the default value) instead of don't care? See section 8.24.
Also why does the user need to name the padding fields at all? In C/C++ anonymous bitfields can be used. So:
header my_hdr_h {
pad<3>;
bit<5> f1;
pad<2>;
bit<14> f2;
}
Instead would be better and then don't need to match up the fields at all with the initializers. Just like in C/C++ you can do:
struct header {
int :3;
unsigned f1:5;
unsigned :2;
unsigned f2:14;
};
And then there is no way to access the fields at all or even casting wise.
We should discuss this in the next design meeting. But I do favor just naming these fields with _. We have to explain what happens in initializers.
Neither @apinski-cavium nor @mbudiu-vmw addresses the key element of Vladimir's proposal -- it changes the semantics of extract
and emit
.
I need to mull it over, but I actually kind of like @vgurevich's proposal, which boils down to introducing a new base type for always-uninitialized bit strings. Then we don't need to change the semantics of header types at all.
@apinski-cavium -- I am all for that. I would not make this mandatory (e.g. someone wants to see those fields in the tools or whatever), but if we can extend language syntax to avoid naming these fields, it will definitely be nice.
@jnfoster -- I think that my proposal extends the semantics of extract()
and emit()
by defining how they behave for this particular type. All current behaviors should remain unchanged, unless I missed something.
What's an example of a padding field with these semantics in real life? I usually think of "padding" as being forced to a particular fixed value (normally all-zeros). If there's no defined value, then a padding field can't appear in any context where the header is accompanied by a checksum.
@blp -- one example is a header that one would prepend to recirculated packet. In many cases these headers need to carry all kinds of fields that are not necessarily byte-aligned. At the same time these headers never "leave" the device -- they are internal to a specific data plane program. TNA uses internal headers pretty extensively: other examples include metadata bridging, mirroring and resubmit.
While it might look strange from the SW standpoint, initializing these fields to 0 and keeping them that way is actually expensive, so expressing the intent (of not needing it) leads to more efficient programs.
Thanks @vgurevich.
OK. It's hard for me to see uninitialized data without immediately thinking INFORMATION DISCLOSURE VULNERABILITY but I guess if it never crosses security domains (one machine to another, kernel->user, etc.) it's OK.
@blp -- I agree. For example in the Intel Connectivity Academy course that I teach, I specifically point out the difference in how one should construct internal-only vs. externally visible headers.
@jnfoster -- I think that my proposal extends the semantics of extract() and emit() by defining how they behave for this particular type. All current behaviors should remain unchanged, unless I missed something.
Right. The point I wanted to highlight was that if we add a new type, and we do it in the right way, I don't believe we have to modify (i.e., special-case) the semantics of extract()
and emit()
. That would make the language simpler, more uniform, elegant, etc.
@apinski-cavium proposed to have these bits hardwired to 0. Would this implementation satisfy the required use cases?
@apinski-cavium Could you provide a link or other reference to your suggestion in the 2022-Aug-08 LDWG meeting that was something along the lines of "when read, these fields behave as if they contained 0"? I am not familiar with this feature in other languages, and was curious if there was something more subtle to its definition than "the field always behaves as if it contains 0".
@apinski-cavium Could you provide a link or other reference to your suggestion in the 2022-Aug-08 LDWG meeting that was something along the lines of "when read, these fields behave as if they contained 0"? I am not familiar with this feature in other languages, and was curious if there was something more subtle to its definition than "the field always behaves as if it contains 0".
It is not really a computer language feature as much as it is a feature of hardware specificications. https://developer.arm.com/documentation/aeg0014/g/Glossary See RAO/WI, RAZ/WI, Read-As-Zero (RAZ), etc. as described in that document. Basically there are some bits of some hardware registers which get ignored even if they are changed and will always be read as zero (or ones depending if they are RAO or RAZ) too.
A default value of dontcare for pad<n>
would be more flexible than 0. It does not prevent a compiler to treat dontcare as 0. But it would not be the case the other way around.
On some target, the following two examples have different resource implication.
header h {
pad<7> _;
bit<9> ingress_port;
}
h = { ingress_port = 1, ... };
pkt.emit(h);
or
header h {
pad<7> pad;
bit<9> ingress_port;
}
h = { pad = 0; ingress_port = 1; }
pkt.emit(h);
@apinski-cavium -- @hanw beat me to it and he is totally correct: requesting that reading from padding will return 0 defeats the purpose of the field, especially in such cases as emit()
where we pass an entire header.
It is also true that this specific extern does not prevent an implementation from returning 0 if it wishes to do so, but it does not require that.
@vgurevich and @hanw: I might myself have missed something, but my understanding of the @apinski-cavium proposal is compatible with the kinds of implementations you are imaging, where no extra storage is needed for the padding fields.
Let's suppose the semantics of pad<_>
is that reading it always returns 0
(of appropriate width) and writing it is a no-op -- the value is discarded.
It follows that when we extract
into a header with pad<_>
fields, the data that would have been extracted into those fields is discarded.
And the compiler doesn't need to allocate any storage for those fields because we know their value -- reading always returns 0
and writing is essentially a no-op.
Finally, when we emit
a header with pad<_>
fields, we write a 0
for those fields. But there should be no problem doing that -- 0
is one of the arbitrary values that could have been chosen by the compiler in the scheme @hanw proposes.
Perhaps someone can explain to me how this semantics "defeats the purpose" of the feature? It signals to the compiler that we don't care about that data. We never need to allocate internal storage for those fields. And when it gets extracted
and emit
ted, it gets canonicalized to 0
. Again, I probably have missed something, but this seems like it accommodates the kinds of implementations you are imagining.
I heard that there were some questions about the equality operations on headers that contain fields of type pad<n>
.
Here are a couple of thoughts:
==
) of two fields of the same type pad<n>
always returns true
. In this case the comparison of headers with padding fields will return true if an only if all non-padding fields are equal, which is what one would expect. Following the same logic, not equal (!=
)operation on the two fields of the same type pad<n>
will always return false
Sorry, I didn't mean to close this!
To add one more comment, I think @hanw's two programs can be implemented the same way.
header h {
pad<7> _;
bit<9> ingress_port;
}
h = { ingress_port = 1, ... };
pkt.emit(h);
and
header h {
pad<7> pad;
bit<9> ingress_port;
}
h = { pad = 0; ingress_port = 1; }
pkt.emit(h);
In particular, in the second program, even though we appear to have a field that would require storage, we know -- just from its type! -- that its value is always 0
.
@jnfoster -- emit()
means reading from the field, isn't it? And the whole purpose here is to allow emitting headers that contain garbage in the padding, instead of zeroes.
@vgurevich let me just say that, in my experience, messing with equality is a Really Bad Idea. The ramifications of such a change are going to be extensive. And it's going to be really unintuitive for programmers.
@apinski-cavium's read-as-zero proposal has the property that equality remains reflexive, and requires no contortions to the semantics of headers, ==
, etc.
@jnfoster -- emit() means reading from the field, isn't it? And the whole purpose here is to allow emitting headers that contain garbage in the padding, instead of zeroes.
The @apinski-cavium proposal is perfectly compatible with this. We just pick 0
as the value for the garbage. Surely that's allowed? Or do we actually care that it's a random bitstring?
If the goal is to allow the compiler to avoid having to allocate internal storage for padding, then @apinski-cavium's proposal achieves that to, as I explained above.
The one possible objection to the @apinski-cavium proposal I can imagine is that the implementation of emit
might be slightly more expensive -- we need to somehow slot in 0
for the padding fields. Is that the concern?
@jnfoster -- whether the memory will or will not be allocated for these fields depends on the target. And in the case it will be allocated, it can contain values other than 0.
If the type (pad<n>
) determines the value (0
), why would a target ever want to allocate storage for that field? I mean, sure, a target is free to do that. Targets are also free to allocate storage for the full text of Joyce's Ulyssess, but on a resource-constrained high-speed switch ASIC, that's unlikely to be a choice that a compiler engineer will make. Instead, good compilers will exploit the static information available from the text of the program. For instance, in the backend, it can replace every read of hdr.pad
with the literal 0
and every write with a no-op...
@jnfoster -- that "slightly more expensive" (to emit 0) is actually "quite a bit more expensive" on some targets, and the whole point of proposal was to avoid that.
emitting pad<n>
field as zero consumes pipeline resource and decreases overlay opportunities, as vlad suggested.
If programmer really wants to emit zero, they should use bit<n>
.
@jnfoster -- please, reach me on Teams and I'll explain why memory needs to be allocated (at least on some targets that you should care about)
Yep, I understand how deparsing works on targets we all care about. (Hence, this comment https://github.com/p4lang/p4-spec/issues/1121#issuecomment-1210094084). I'll ping you on Teams to chat when it's not 11pm in ET.
Before I go to sleep, I'll say that the semantics where pad<n>
always has an undetermined value is also fine by me (which is why I endorsed it above).
I think the main remaining concerns, as highlighted in this issue are:
==
) is not guaranteed to be reflexive on values that contain pad<n>
.
I was attracted to the @apinski-cavium proposal because it avoided both of these issues.(In fact, we already have the issue with equality, due to invalid headers and uninitialized values. However, the front-end of the compiler always assumes that e == e
for every expression e
, and it uses this in certain optimizations. This is legal, as compilers are always allowed to refine the language semantics. However, assuming that e == e
is possibly incompatible with choices that could be made by backends. So has always left me uneasy to have that optimization be done in the front-end.)
In the interest of tidying up the set of active issues on the P4 specification repository, I'm marking this as "stalled" and closing it. Of course, we can always re-open it in the future if there is interest in resurrecting it.
Personnel
Design
Implementation
p4-spec
:p4c
: https://github.com/p4lang/p4c/pull/2706Process
p4-spec
:p4c
:=======================================
Introduction
There is a need to have some "filler" bits in the headers, e.g. for byte-alignment. Currently this is achieved by simply inserting a field (typically of an unsigned type) into the header:
This, however, forces the programmer to be very explicit about the values in these
_pad[01]
fields. It also forces the compiler to make sure that these fields behave correctly as their type dictates. For example, the values written in those fields are supposed to stay there unchanged. This, unfortunately, prevents a number of important optimizations that can be enabled as long as there is a way to express the fact that nobody cares what's inside those fields. Currently, some compilers allow the programmer to express that fact by using a special@padding
annotation:This mechanism works pretty well as long as nobody tries to do anything special with the values in the fields marked with
@padding
annotation, but strictly speaking this annotation is semantically significant: it can be easily seen that the behavior of a program with and without this annotation is different.As was agreed many times, such annotations are better be replaced with the standardized constructs.
Proposal
Introduce a new type,
padding<n>
(orpad<n>
). This is a serializable, scalar type.Example:
Supported operations
_
, e.g:bit<n>
are supported, but flagged and the result is undetermined