Open jafingerhut opened 6 years ago
@vgurevich Take a look at the above and see if it might be close to what you are thinking of. @ChrisDodd I think you may have been the one asking about this on the phone during the WG meeting?
It looks to me like what you want is a version of the hash function which is an extern - because it needs a control-plane API. Other than that, the internals of the function are irrelevant for the data-plane. It just takes N bits and emits M.
It's worth considering whether we need to add a new primitive for this.
Here's a simple example code snippet that is already legal P4_16 that shows
how one could pick the upper byte or lower byte of a two-byte field r.f
for later
use in a table t
.
The selection is managed by the control plane interface to table s
-- in particular, the choice of the default_action
-- and can either be the upper byte, lower byte, or a random byte picked dynamically for every packet.
#include <core.p4>
#include <v1model.p4>
struct raw {
bit<16> f;
}
struct proc {
bit<8> picked;
}
control c(inout raw r) {
proc p = { 0 };
action pick(bit<1> b) {
if (b == 0) {
p.picked = (bit<8>) r.f;
} else {
p.picked = (bit<8>) (r.f >> 8);
}
}
action pick_lower() {
pick(0);
}
action pick_upper() {
pick(1);
}
action pick_random() {
bit<32> b;
random(b, 0, 1);
pick((bit<1>) b);
}
table s {
key = { }
actions = { pick_upper; pick_lower; pick_random; }
default_action = pick_random;
}
table t {
key = { p.picked : exact; }
actions = { NoAction; /* Fill in other actions here */ }
}
apply {
s.apply();
t.apply();
}
}
I can see that there might be a case for having a new extern -- it's more succinct and might be easier for compilers to handle. On the other hand, there are other one-off pieces of code that have to be designed and implemented, like the P4 Runtime control-plane API.
One sort of hopes that having gone to the trouble of designing a language with the right domain-specific primitives and general programming abstractions, we can eventually stop adding new primitives and simply use existing language constructs to express the behavior we need.
This works, but needs an extra table. They want to do this within an action. I am not sure the compiler would be able to eliminate this extra table.
I agree the compiler would need to do something extra, but it should be relatively easy to identify and eliminate a table with no keys (like s
) provided its actions can be mapped to native bit blit primitives on the target.
Agreed that the proposed extern can be implemented with a table, if statements, and existing P4_16 operators.
I would mention that we can also implement counters, meters, and random number generators using register externs and the base language only, hashes and checksums with the base language only, and action profiles and action selectors with 2 or 3 tables and the base language only.
Those facts do not imply that we should add the kitchen sink of externs to PSA, willy nilly, and this proposal might be going too far in that direction. But the code for selecting 16 out of 25 bytes gets pretty big and very repetitive. I can write a small Python program to generate the P4_16 code for the behavior for arbitrary parameters (and plan to do so soon, as a reference implementation).
The main motivations in favor of such an extern that I can think of are:
Hi @jafingerhut
I agree that programmers shouldn't have to write large amounts of code to do simple things. We want to provide primitives that make the common case simple and succinct.
Let me refine what I was trying to say. I have three main points.
Note: I am not sure if this proposal requires extending the P4_16 language. It might.
The code I provided suggests that it does not, at least at the semantic level.
Conflating these two concerns in a single run-time configurable bit-selection operation is what I would push back against. I totally agree that having to write dozens of lines of code to extract bits is too verbose. We should add new primitives to make slicing and dicing bits super easy. And we should have something like a function so programmers can package up their own custom bit-blitting operations and give them convenient names.
But we already have several perfectly good ways for the control plane can interpose on data-plane behavior without having to recompile the P4 program. For example, we can get run-time reconfiguration through tables -- in particular, default_action
s of tables. Or if you don't like introducing a table, you could use a register of size 1:
enum choice {
first, second, random
}
control c(inout raw r) {
proc p = { 0 };
action pick(bit<1> b) {
if (b == 0) {
p.picked = (bit<8>) r.f;
} else {
p.picked = (bit<8>) (r.f >> 8);
}
}
action pick_lower() {
pick(0);
}
action pick_upper() {
pick(1);
}
action pick_random() {
bit<32> b;
random(b, 0, 1);
pick((bit<1>) b);
}
table t {
key = { p.picked : exact; }
actions = { NoAction; /* Fill in other actions here */ }
}
register<choice>(1) g;
apply {
choice h;
g.read(h, 0);
if (h == choice.first) pick_lower();
if (h == choice.second) pick_upper();
else /* h == choice.random */ pick_random();
t.apply();
}
}
What I would really argue against is adding a new primitive that both adds new bit-blitting operations and extends the interface of the control-plane. Adding new mechanisms by which the control plane can reconfigure the data plane will complicate several things including the language design, the compiler, and P4 Runtime.
Got to run. Happy to discuss further later.
@jnfoster - My apologies, I missed that you were at least in part responding to the "whether new language features are needed". I am pretty sure now that what is proposed here doesn't require new language features.
I am also sensitive to the idea of combining together features that ought to be teased apart and made into multiple lego blocks that can be combined more flexibly in multiple ways.
I can easily imagine separating this idea into two pieces - getting field-select config data from a table or register and putting it "somewhere", then later taking that field-select data and passing it to a control or function that is a pure function of its inputs, a function that takes the wide data (25 bytes in my example) and the field-select data, and returns the selected data (16 bytes in my example).
I mentioned two reasonable-sounding formats for the field-select data in my original message: format A: a sequence of byte offsets, or format B: a bit vector with 1s for the fields to be included in the output.
Here is a difficulty I am imagining, which maybe only exists for an RMT-like implementation of the idea -- I don't know what format to make the field-select data, and I believe some targets will benefit from not allowing the P4 program to pick them on a per-packet basis.
Why? RMT architectures have all kinds of internal crossbar and muxes that would be potentially useful for implementing this, but how you control their outputs might be neither of format A or B, but more implementation-specific. For example, maybe it is some VLIW instructions, or some settings on a crossbar somewhere that can be configured via hardware config registers, but not from the P4 program run-time data.
If it weren't for that, I wouldn't see any issue with just picking one of the reasonable-sounding formats. But because of that, I expect it could nullify the potential benefits of the idea, if we tried to say "pass a bit vector in format A to the extern", and let the P4 program generate a format A bit vector on the fly for each packet.
That is an argument for combining a control plane API and data plane behavior for this idea (albeit perhaps not a good enough one for an extern to be added to PSA).
I have created an example 'reference implementation' in P4_16 code for this idea here: https://github.com/jafingerhut/p4-guide/tree/master/select-fields
It only adds more proposed details of one way to do it. I don't think it makes any new arguments for motivating why it may be beneficial to make a new stateful extern object for this functionality, vs. a stateless extern function.
It does seem worth asking people who want this whether merely adding stateless extern function for the field-selection operation, and having the compiler do extra analysis to determine whether the input bits that control the field selection process come from table outputs (i.e. action parameter values of an earlier table apply call) with no further modification, would be straightforward or difficult. If straightforward, then perhaps there is little motivation for creating a stateful extern object for this.
@jafingerhut , @jnfoster
First of all, I think we need to agree on the problem definition, because it looks to me that the proposed approaches try to solve two different problems.
Specifically, Andy's initial proposal was calling for allowing just the control plane to select the fields. Once the selection is done, it will be the same set of fields for each and every packet, until the control plane reconfigures the data plane for a new set of fields. Indeed, that's the problem we were originally discussing.
Nate's approach is much more powerful, since it allows choosing the set of fields on a per-packet basis. In the example it came from the register g, but it should not obscure the fact that it could've come form anywhere.
While, mathematically speaking it is clear that if one can choose different the fields (or choose to do something else) on per-packet basis, one can easily select the same fields for all the packets, the mechanisms used to implement that in the real HW world might be quite different.
In Nate's example, it will probably be a real table with real actions (although I can easily imagine hardware with crossbars that can be controlled on per-packet basis and then it will not be), while in Andy's example it will probably be a static reconfiguration of a hash computation unit, crossbar or both.
Yes, theoretically the compiler might be able choose the widely different mechanisms by analyzing the program code, but given what I currently see I think we are quite the number of years from that at least. Moreover, even if the compiler is perfect, there is another "kink": these tables or registers most probably will have somewhat different APIs, compared to their "real" counterparts, which will lead to the proliferation of special annotations and such.
Therefore, at least today and for the next N years, I would advocate an "extern" approach. From my perspective, the advantage is that each extern type can have a specialized API and would fairly clearly convey the complexity and resource usage, at least on a given architecture.
Now, on the specific proposal (and that's where I think that Andy's hunch on the necessity of changing the language might be correct). I do like the idea of an extern that takes a set of fields and outputs a collection of bits from these fields up to N bits wide. However, let's consider that we use the extern twice with two different sets of fields, e.g.
hash1_fields = hash1_field_selector({hdr.ethernet.srcAddr, hdr.ethernet.dstAddr,
hdr.ipv4.srcAddr, hdr.ipv4.dstAddr, hdr.ipv4.protocol,
meta.l4fields.srcPort, meta.l4fields.dstPort});
. . .
hash2_fields = hash1_field_selector({hdr.ethernet.srcAddr, hdr.ethernet.dstAddr,
hdr.ipv6.srcAddr, hdr.ipv6.dstAddr, hdr.ipv6.nextHdr
meta.l4fields.srcPort, meta.l4fields.dstPort});
Now, configuring that extern becomes pretty much impossible. This tells me that the list of fields should, instead be specified during extern instantiation instead of during its invocation, so that we have a clearly named instance to work with in the APIs. Something like that:
SelectFields(bit<128>, {hdr.ethernet.srcAddr, hdr.ethernet.dstAddr,
hdr.ipv4.srcAddr, hdr.ipv4.dstAddr, hdr.ipv4.protocol,
meta.l4fields.srcPort, meta.l4fields.dstPort}) hash1_field_selector;
and this is precisely where I think we might not have enough support in the language.
The API for the hash selector should allow selecting each field (or subset of bits from each field using a bitmask) up to the full width of that selector. It might also allow you to specify the order of fields.
Why not:
SelectFields<tuple<bit<48>, bit<48>, bit<32>, bit<32>, bit<4>, bit<32>, bit<32>>() instance;
@mbudiu-vmw ,
Is it allowed, i.e. specifying tuple<> in a place where a template type is specified? if yes, then SelectFields can be defined with just two types: one representing the output width and another one being the input tuple.
Also, what is will be required for the decent API generation is to name individual members of that tuple.
Nate's approach is much more powerful, since it allows choosing the set of fields on a per-packet basis.
My original example used a table without keys, which can only be reconfigured by the control-plane. I added a second example using a read-only register because there was some skepticism that a compiler could effectively analyze such a table.
I can see that my pleas for simplicity are falling on deaf ears. But to make one last attempt: if PSA is successful, we will not only be concerned with mapping it down to hardware targets, but also with developing abstractions that sit on top of it and below it. For example, a company like VMware might well want to virtualize the PSA and allow the P4 code written against it to be moved around in a datacenter alongside client VMs. For these and other uses, keeping PSA simple as possible will be a big win. Every one-off primitive we add to make it slightly easier for the compiler to generate specific code for a particular hardware widget increases the costs of developing tools for analyzing and transforming P4 code everywhere else.
There's a fair bit of skepticism about the ability of a compiler to generate good code -- a perspective I do not share as it is at odds with several decades of progress in compiler technology.
Of course, it's not the end of the world to have a few redundant externs, but I encourage everyone to take a default skeptical stance whenever we consider adding new ones.
@vgurevich You can also use a struct, then the fields are named. @jnfoster writing a new compiler takes a long time. Moreover, compiling for hardware is more akin to place-and-route, which is still in a very unsatisfactory state as far as software tools are.
@mbudiu-vmw ,
Thanks, Mihai! So just to be clear, is that how do you think the extern might look like:
extern FieldSelector<O, I> {
FieldSelector<O, I>(I );
O get();
}
struct my_hash_fields_t {
mac_addr_t mac_da;
mac_addr_t mac_sa;
ipv4_addr_t src_ip;
ipv4_addr_t dst_ip;
bit<8> ip_proto;
bit<16> l4_src_port;
bit<16> l4_dst_port;
}
FieldSelector(bit<128>,
<my_hash_fields_t>(h.ether.srcAddr,
h.ether.dstAddr,
h.ipv4.srcAddr,
h,ipv4.dstAddr,
h.ipv4.protocol,
m.l4.sport,
m.l4.dport) fs1;
bit<128> hash_input;
hash_input = fs1();
You cannot bind the fields in the instantiation, but you can fix the type. The syntax would be:
extern FieldSelector<O, I> {
FieldSelector();
O get(I);
}
FieldSelector<bit<128>, my_hash_fields>() fs1;
bit<128> hash_input = fs1.get({h.ether.srcAddr,
h.ether.dstAddr,
h.ipv4.srcAddr,
h,ipv4.dstAddr,
h.ipv4.protocol,
m.l4.sport,
m.l4.dport });
@mbudiu-vmw ,
That's precisely what I was trying to avoid :) Suppose I have two calls to fs1.get() with different set of fields. I won't be able to generate APIs for each of them, because there is no name associated with the call itself. I can use the the struct field names, but they will have little to do with what's in the call.
For such an extern it makes sense to restrict it to a single invocation in a pipeline. There are other externs like this one, BTW, so this is not a strange restriction (clone? resubmit?). If you want to have two separate control-plane APIs then you will have to create two instances. This actually makes perfect sense, because you will probably want to configure each instance differently.
(Note that I am now using the mbudiu-vmw id.)
@mbudiu-vmw ,
Unfortunately, Github always suggests your older user ID first :)
I agree that it makes sense to have multiple instances -- that's precisely what I wanted to do. I just wanted to do it at the language level, without invoking that single invocation restriction. While it will work, it is still a little awkward, mostly because the API generator now has to be aware not only how the object was instantiated, but also it needs to know now where and how it is used. I think this makes it fairly exceptional and adds a lot of complexity, doesn't it?
I don't see why the api generator has to do anything special. The api does not depend on how you invoke it, but only on how you investigate instantiate it. And the constructor call has all the information it needs.
@mbudiu-vmw ,
The desired APIs should be "add h.ether.srcAddr to fs1" not "add ?.mac_sa fo fs1" (where "?" is not clear, probably either "I" or "my_hash_fields_t"). This is much more convenient and also matches the current P4_14 capability. To generate such an API the API generator needs to know where and how the extern is invoked.
If you want the API to look that way just give the proper names to the fields of the structure my_hash_fields_t. This structure is used in a single place anyway.
I don't think you want 'h' in the API, that can be just a local variable.
Attempt number 2...
Just to re-iterate the problem. Basically, we need to find an equivalent for the following P4_14 construct:
field_list fl_1 {
field_1_1;
. . .
field_1_n;
}
. . .
field_list fl_M {
field_M_1;
. . .
field_M_n;
}
field_list_calculation {
input {
fl_1;
. . .
fl_M;
}
}
In case of field_lists we should be able to select 0 or more bits from 0 or more fields (in the simplest case we can select any combination of 0, 1, .. n fields). In the case of the input clause of the field_list_calculation we can select one and only one field list.
In other words, we need to create two externs. For the lack of the better name, we can call then a CheckBoxSelector
and RadioButtonSelector
(as these things are typically called in the GUI world).
The declaration should look something like this:
CheckBoxSelector(field_1_1, . . ., field_1_n) fl1;
. . .
CheckBoxSelector(field_M_1, . . . , field_M_n) flM;
RadioButtonSelector(fl_1, . . ., fl_M) final_list;
Obviously, this is already outside of the language spec, but the thinking is that since these declarations are evaluated at compile-time, the whole copy-in/copy-out semantics is not applicable and instead we can use these field names as references.
These externs will have have a single data plane method, let's call it selected_get()
in both cases that returns a bit<n>
.
In case of CheckBoxSelector
extern, the control plane API can mark either entire fields field_i_j
(in the simple case) or any subset of their bits as "selected". In this case the selected_get()
method will return a bit<n>
value that is either (TBD) a concatenation of all the selected bits or a concatenation of all the bits, but masked appropriately (i.e. all non-selected bits will be zeroed out). The latter case presents a challenge, since the actual value of N will not be known by the compiler (but the maximum will still be). In any case, expressing this type is something we need to think about.
In case of a RadioButtonSelector extern
, the control plane API can mark only one field (item) as selected. In this case the selected_get()
will return a bit<n>
value that represents the selected item. The challenge with regards to the type of the output is the same: the compiler will know the max (but still not clear how to express it).
The rest is easy -- the output of selected_get()
can be fed into anything, e.g. hash calculation or even a table (foe matching purposes).
Note that in the declaration of the "final_list" selected_get() wasn't called explicitly, but the intent was to have it like so:
RadioButtonSelector(fl_1.selected_get(), . . ., fl_M.selected_get()) final_list;
Again, this is another thing we need to figure out: if we allow references to externs (as well as to fields) what should be the semantics? For fields it is relatively clear (maybe), but for externs? Maybe they need to have a special ref method?
So, I think there are more questions than answers so far.
In terms of the control plane APIs, conceptually they may look like:
typedef enum CheckBoxSelectors_e {
checkBoxSelector_FL_1,
. . .
checkBoxSelector_FL_N,
checkBoxSelector_COUNT_
} CheckBoxSelectors_t;
enum RadioButtonSelectors_e {
radioButtonSelector_FINAL_LIST,
radioButtonSelector_COUNT_
} RadioButtonSelectors_t;
typedef enum Fields_e {
field_FIELD_1_1,
. . .
field_FIELD_M_N
field_COUNT_
} Fields_t;
check_box_select_set(dev, checkBoxSelector_FL_1, field_FIELD_1_1, True); /* Select entire field */
check_box_select_set_mask(dev, checkBoxSelector_FL_1, field_FIELD_1_1, mask); /* Select some bits */
radio_button_select(dev, radioButtonSelector_FINAL_SET, checkBoxSelector_FL_2); /* If FL_1 was selected, now it is FL2 and so on*/
Hi Vladimir,
Thanks for sending this again. Is this something we could fruitfully discuss on Monday at the P4 LDWG meeting? If so, could you lead it?
-N
On Wed, Jan 17, 2018 at 4:56 PM, Vladimir Gurevich <notifications@github.com
wrote:
Attempt number 2...
Just to re-iterate the problem. Basically, we need to find an equivalent for the following P4_14 construct:
field_list fl_1 { field_1_1; . . . field_1_n; }
. . . field_list fl_M { field_M_1; . . . field_M_n; }
field_list_calculation { input { fl_1; . . . fl_M; } }
In case of field_lists we should be able to select 0 or more bits from 0 or more fields (in the simplest case we can select any combination of 0, 1, .. n fields). In the case of the input clause of the field_list_calculation we can select one and only one field list.
In other words, we need to create two externs. For the lack of the better name, we can call then a CheckBoxSelector and RadioButtonSelector (as these things are typically called in the GUI world).
The declaration should look something like this:
CheckBoxSelector(field_1_1, . . ., field_1_n) fl1; . . . CheckBoxSelector(field_M_1, . . . , field_M_n) flM;
RadioButtonSelector(fl_1, . . ., fl_M) final_list;
Obviously, this is already outside of the language spec, but the thinking is that since these declarations are evaluated at compile-time, the whole copy-in/copy-out semantics is not applicable and instead we can use these field names as references.
These externs will have have a single data plane method, let's call it selected_get() in both cases that returns a bit
. In case of CheckBoxSelector extern, the control plane API can mark either entire fields field_i_j (in the simple case) or any subset of their bits as "selected". In this case the selected_get() method will return a bit
value that is either (TBD) a concatenation of all the selected bits or a concatenation of all the bits, but masked appropriately (i.e. all non-selected bits will be zeroed out). The latter case presents a challenge, since the actual value of N will not be known by the compiler (but the maximum will still be). In any case, expressing this type is something we need to think about. In case of a RadioButtonSelector extern, the control plane API can mark only one field (item) as selected. In this case the selected_get() will return a bit
value that represents the selected item. The challenge with regards to the type of the output is the same: the compiler will know the max (but still not clear how to express it). The rest is easy -- the output of selected_get() can be fed into anything, e.g. hash calculation or even a table (foe matching purposes).
Note that in the declaration of the "final_list" selected_get() wasn't called explicitly, but the intent was to have it like so:
RadioButtonSelector(fl_1.selected_get(), . . ., fl_M.selected_get()) final_list;
Again, this is another thing we need to figure out: if we allow references to externs (as well as to fields) what should be the semantics? For fields it is relatively clear (maybe), but for externs? Maybe they need to have a special ref method?
So, I think there are more questions than answers so far.
In terms of the control plane APIs, conceptually they may look like:
typedef enum CheckBoxSelectors_e { checkBoxSelector_FL_1, . . . checkBoxSelector_FL_N, checkBoxSelectorCOUNT } CheckBoxSelectors_t;
enum RadioButtonSelectors_e { radioButtonSelector_FINAL_LIST, radioButtonSelectorCOUNT } RadioButtonSelectors_t;
typedef enum Fields_e { field_FIELD_1_1, . . . field_FIELD_M_N fieldCOUNT } Fields_t;
check_box_select_set(dev, checkBoxSelector_FL_1, field_FIELD_1_1, True); / Select entire field / check_box_select_set_mask(dev, checkBoxSelector_FL_1, field_FIELD_1_1, mask); / Select some bits / radio_button_select(dev, radioButtonSelector_FINAL_SET, checkBoxSelector_FL_2); / If FL_1 was selected, now it is FL2 and so on/
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/p4lang/p4-spec/issues/516#issuecomment-358461294, or mute the thread https://github.com/notifications/unsubscribe-auth/ABwi0iPaG8nsbruWred3tZ5Cd0Bnk_ALks5tLmyYgaJpZM4Q5O5K .
@jnfoster ,
I'd love to, but, alas, I am teaching Barefoot Academy full day on Monday and have to skip the meeting altogether :(
Since the proposal highlights the needs, but the proposed solutions are relatively vague, perhaps we can collect some ideas and discuss it on Feb 5th? I really hope @mbudiu-vmw, @ChrisDodd and others will weigh in before that.
This was briefly mentioned near the end of the 2017-Dec-06 P4 Arch WG meeting, in the context of hash functions.
The motivation is that even though it is possible to change the collection of packet fields used for a table search key, hash function, etc. by modifying the P4 source code and recompiling, there is a desire by some users of P4 devices to be able to change some of these things at configuration time, without having to change P4 source code or perform any recompilation.
One possibility: Create a new extern or family of externs. For the sake of example I will propose placeholder names Select8BitFields, Select16BitFields, Select32BitFields.
Note: I am not sure if this proposal requires extending the P4_16 language. It might.
Proposed example of use of Select8BitFields:
The control plane API for this example instance
hash1_field_selector
would consist of 16 byte offsets within the 25 8-bit list of fields. That is, it would be 16 5-bit values, each specifying a byte position in the range [0, 24] of the 25 bytes. The control plane could change these 16 5-bit values whenever it wanted to, and the next time a data packet made the call to get_fields(), those byte positions would be selected from the 25 bytes.Alternate method for configuring the set of bytes to select, if repeats of input bytes in the output are never needed, and the order of bytes in the output is unimportant: specify a 25-bit vector, where exactly 16 of the 25 bits are equal to 1, and the rest are 0. The bits containing a 1 correspond to bytes in the 25-byte input that are included in the output.
To make the output more useful for ignoring bits within bytes that we do not want included in an exact match key, or a hash function, the control plane API could also specify a 16-byte mask that will be bit-wise ANDed with the result described above, and that is actually what get_fields() returns. Including this in the behavior of get_fields() avoids the need to have a separate table whose only purpose is to provide a wide mask value.
The 8-bit version would be sufficient for most purposes, I would guess. The only reason to have a 16-bit version and 32-bit version is that they could be cheaper to implement when you only need to be able to select 'pieces' that are multiples of 16 bits or 32 bits wide.
One could imagine proposing 4-bit, 2-bit, and 1-bit versions, but they are more expensive to implement than the 8-bit version, and I suspect not often needed.