p4lang / p4c

P4_16 reference compiler
https://p4.org/
Apache License 2.0
665 stars 439 forks source link

Casting bit<16> to bit<32> in checksum causes incorrect json to be generated #1765

Closed telmich closed 5 years ago

telmich commented 5 years ago

When using the v1model, the following code causes incorrect json for bmv2 to be generated:

control MyComputeChecksum(inout headers  hdr, inout metadata meta) {
    apply {
        bit<32> icmp6_len = (bit<32>) hdr.ipv6.payload_length;

        update_checksum_with_payload(meta.calc_icmp6_checksum,
            {
                hdr.ipv6.src_addr,         /* 128 */
                hdr.ipv6.dst_addr,         /* 128 */
                icmp6_len,                 /* 32 */
                24w0,                      /* 24 0's */
                PROTO_ICMP6                /* 8 */
            },
            hdr.icmp6.checksum,
            HashAlgorithm.csum16
        );
    }
}

(full code available on https://gitlab.ethz.ch/nicosc/master-thesis/tree/master/p4src)

When running p4@ubuntu:~/master-thesis/p4app$ sudo p4run --config static-mapping.json

The output is

s1 Starting P4 switch s1.
simple_switch -i 1@s1-eth1 -i 2@s1-eth2 -i 3@s1-eth3 -i 4@s1-eth4 -i 5@s1-cpu-eth0 --pcap=/home/p4/master-thesis/p4app/pcap --thrift-port 9090 --nanolog ipc:///tmp/bm-1-log.ipc --device-id 1 ../p4src/static-mapping.json --log-console >/home/p4/master-thesis/p4app/log/s1.log
P4 switch s1 did not start correctly. Check the switch log file.
p4@ubuntu:~/master-thesis/p4app$ 

And the content of s1.log is:

Warning: you requested the nanomsg event logger, but bmv2 was compiled without -DBMELOG, and the event logger cannot be activated
Calling target program-options parser
Unsupported input type 'expression' for calculation 'calc'
bad json:
{  
   "type" : "expression",
   "value" : {
      "left" : {
         "type" : "field",
         "value" : [ "ipv6", "payload_length" ]
      },
      "op" : "&",
      "right" : {
         "bitwidth" : 32,
         "type" : "hexstr",
         "value" : "0xffffffff"
      }
   }
}
telmich commented 5 years ago

When shifting the cast into the parser (https://gitlab.ethz.ch/nicosc/master-thesis/commit/0ca0b94b1cd5d525d4ef06de6913ba1d2f455bd0 and typo fix https://gitlab.ethz.ch/nicosc/master-thesis/commit/1b95179fc52f0d4abc88498479f14a0bad8b2b67) then the failing json looks like this:

Invalid entry type 'expression' in field list
bad json:
{
   "type" : "expression",
   "value" : {
      "type" : "expression",
      "value" : {
         "left" : null,
         "op" : "d2b",
         "right" : {
            "type" : "field",
            "value" : [ "scalars", "metadata.calc_icmp6_checksum" ]
         }
      }
   }
}

p4@ubuntu:~/master-thesis/p4app$ 
telmich commented 5 years ago

Ok, I found the source of the problem: using bit<1> instead of bool makes the code compile again, as seen on https://gitlab.ethz.ch/nicosc/master-thesis/commit/e9b08d638cceb38c61298ae4d7dd895a36edd7f2

telmich commented 5 years ago

... however passing bit<1> into update_checksum_with_payload() results into

p4c --target bmv2 --arch v1model --std p4-16 "../p4src/static-mapping.p4" -o "/home/p4/master-thesis/p4src"
../p4src/checksums.p4(24): error: update_checksum_with_payload: Cannot unify bit<1> to bool
        update_checksum_with_payload(meta.do_cksum,
        ^
Compilation Error
p4@ubuntu:~/master-thesis/p4app$ 
hesingh commented 5 years ago

Of course, you will see a Cannot unify error because the v1model.p4 has the following API:

extern void update_checksum_with_payload<T, O>(in bool condition, in T data, inout O checksum, HashAlgorithm algo);

The API uses a boolean for first parameter and you are passing in bit<1>. You have have to debug the failures when boolean is used.

telmich commented 5 years ago

It seems like the following workaround can help me for the time:

control MyComputeChecksum(inout headers  hdr, inout metadata meta) {
    apply {
        update_checksum_with_payload(meta.do_cksum == 1,
            {
...
telmich commented 5 years ago

So basically, this bug report includes 2 problems:

Should I open a second bug report?

telmich commented 5 years ago

So I found 2 workarounds:

mihaibudiu commented 5 years ago

There is a comment in v1model.p4 which says:

/* The only legal statements in the implementation of the
ComputeChecksum control are: block statements, calls to the
update_checksum and update_checksum_with_payload methods,
and return statements. */

But still, the compiler should give you a nice error message, and not generate incorrect json, so this is a bug.

mihaibudiu commented 5 years ago

It would be best if you could just drag-and-drop the P4 program into this issue; some of the files you are pointing to you seem to have already edited, and some other links are just to specific commits. If you give us a complete file that triggers the bug it makes it easier to reproduce it.

telmich commented 5 years ago

Hey @mbudiu-vmw!

I tried to shrink it down, but was not completely able to reproduce it then. So what I did is, I create 2 new source directories that allow you to 1:1 reproduce the bugs:

I also added a README with the output on my system into p4debug/*/README

Note: I was not able to reproduce it with p4src/bug1-bool.p4, which was my try to shrink down the logic for you to make it easier for debugging.

jafingerhut commented 5 years ago

If those JSON files are produced from particular P4 source files that you have available, that is much more useful for investigating problems in p4c than a JSON file created from some now-lost P4 source file.

Even a larger P4 source file that reproduces the problem is better than a smaller one that does not.

jafingerhut commented 5 years ago

Just for your information when looking for additional possible workarounds, the Internet checksum result is unaffected by 16-bit words that are completely equal to 0, so casting from a 16-bit unsigned value to a 32-bit unsigned value, while it likely matches the format of the pseudo-header shown in RFC documents, is unnecessary in this case, as is any 16-bit aligned 16-bit word that is a constant 0.

It is still of course useful to correct problems found in p4c, so thanks for the report.

hesingh commented 5 years ago

What is p4run?

I know the bmv2 backend supports Cast and the use of boolean should be fine in P4 code for bmv2. Thus, the issue could just be using p4run when shortened programs do not show the issue. I have testes Cast to work with the bmv2 backend. I prefer to use p4c-bm2-ss to test the bmv2 backend and also generate json.

mihaibudiu commented 5 years ago

For me the json generated from bug1-bool.p4 works. The json is not useful, what we need is a P4 program that generates the wrong json.

telmich commented 5 years ago

@hesingh p4run comes from https://github.com/nsg-ethz/p4-utils - we have used this here at ETHZ in a course. It is a wrapper around p4c and mininet, as far as I can see. It was basically introduced in a course by the https://nsg.ee.ethz.ch/home/ group (and that is the course, btw: https://adv-net.ethz.ch/).

For my understanding: When you talk about "the generated json", you are not referring to the two json files I have in my p4app/ directory, but something that is generated by ... ? By what exactly?

telmich commented 5 years ago

@jafingerhut The two json files I mentioned above are written by me / adjusted from examples from the course lecture. They are not generated.

telmich commented 5 years ago

@mbudiu-vmw Sorry for maybe the stupid question, I am still rather new to P4: As far as I understood, there is a bunch of .p4 files which constitutes a switch (that is the stuff in my p4src/ directory), there is p4app/json (manually created) that describes the environment, mininet configuration, controller, etc.

Which JSON would usually have been generated by what? Sorry again, so far the only way how I have used p4 is using p4run from p4-utils.

hesingh commented 5 years ago

@hesingh p4run comes from https://github.com/nsg-ethz/p4-utils - we have used this here at ETHZ in a course. It is a wrapper around p4c and mininet, as far as I can see. It was basically introduced in a course by the https://nsg.ee.ethz.ch/home/ group (and that is the course, btw: https://adv-net.ethz.ch/).

For my understanding: When you talk about "the generated json", you are not referring to the two json files I have in my p4app/ directory, but something that is generated by ... ? By what exactly?

p4c-bm2-ss (which is a driver/executable) built when one compiles p4c. It's p4c-bm2-ss that can generate a json file. Please see the help for this command.

I see about p4run. However, maybe, none of our P4 compiler folks have used this one.

mihaibudiu commented 5 years ago

@hesingh you are commenting on the wrong issues.

mihaibudiu commented 5 years ago

@telmich : just provide us with a P4 program that causes the tools to crash.

hesingh commented 5 years ago

Wow, sorry, being busy, I put the comment on the wrong one.

telmich commented 5 years ago

Attached are the 2 generated json files

bug1-bool-static-mapping.json.txt bug2-cast-in-checksum-static-mapping.json.txt

telmich commented 5 years ago

@mbudiu-vmw The full source code of both programs can be found in the following 2 directories:

The main file each time is static-mapping.p4

p.s.: I'm also on slack, in case you need more / want me to debug something particular

jafingerhut commented 5 years ago

Notes on file bug1-bool-static-mapping.json.txt

I have used the most recent version as of 2019-Mar-06 of p4c to compile your bug1-bool source code, and it generates a BMv2 JSON file that does not cause the same error when running simple_switch with it, that I do see when I run simple_switch with your bug1-bool-static-mapping.json.txt attached file.

In comparing the BMv2 JSON files manually, I do see improvements from the BMv2 JSON file produced with the latest p4c version that avoid at least the problem causing simple_switch to exit quickly.

It appears that you are using a version of p4c built from source code from 2018-Sep-24 or earlier, because there is some additional source_info info added by p4c versions after that date that is not present in your BMv2 JSON file. I am not claiming that every change to p4c is an improvement, and I do not know what change may have improved the results for this particular program, but you may want to consider trying a more recent version of p4c, unless you have found workarounds with the older version of p4c you are using that are acceptable to you.

$ simple_switch --log-console bug1-bool-static-mapping.json.txt
[ ... some lines of output deleted here ... ]

Invalid entry type 'expression' in field list
bad json:
{
   "type" : "expression",
   "value" : {
      "type" : "expression",
      "value" : {
         "left" : null,
         "op" : "d2b",
         "right" : {
            "type" : "field",
            "value" : [ "scalars", "metadata.do_cksum" ]
         }
      }
   }
}

$ git clone https://github.com/p4lang/p4c
$ cd p4c
$ git log -n 1 | head -n 3
commit e5f68627d710514b3a0c3263c60e3447345ceda4
Author: Mihai Budiu <mbudiu@vmware.com>
Date:   Wed Mar 6 13:45:42 2019 -0800

[ ... not including commands that I used to build p4c from source ... ]

$ cd $HOME
$ git clone https://gitlab.ethz.ch/nicosc/master-thesis
$ cd master-thesis
$ git log -n 1 | head -n 3
commit 81bf368279201a9ac7b32537f6e150144f84646b
Author: Nico Schottelius <nico@nico-notebook.schottelius.org>
Date:   Thu Mar 7 00:33:41 2019 +0100

$ cd p4debug/bug1-bool
$ p4c --target bmv2 --arch v1model static-mapping.p4
$ simple_switch --log-console static-mapping.json

[ ... No error message here, and simple_switch continues to run.  I do
not know if there are _no_ problems with the BMv2 JSON file, but there
are at least no problems that cause simple_switch to exit
quickly. ... ]
hesingh commented 5 years ago

Most diligent work, Andy, you the man!

jafingerhut commented 5 years ago

Notes on file bug2-cast-in-checksum-static-mapping.json.txt:

Using the same version of p4c mentioned in my previous comment, and the same version of the source code in the master-thesis repository, there is still a bad BMv2 JSON file produced for the source program in master-thesis/p4debug/bug2-cast-in-checksum/static-mapping.p4 and the other files it #include's in that same directory.

It appears that simple_switch, in a BMv2 JSON file "calculations" section, each field of the calculation input is not allowed to have type "expression", and p4c is generation such a type for a calculation field.

One possibility would be enhancing simple_switch to allow expressions in that part of a BMv2 JSON file, and then it seems that p4c would not need any changes.

Another is for p4c to automatically determine that it must make assignments at the end of the egress control to newly generated temporary variables, and then use those temporary fields in the "calculations" section.

I should be able to make a much smaller P4_16 source program exhibiting the bug, in a single source file, and attach it soon.

jafingerhut commented 5 years ago

Another possibility (in addition to the two mentioned in my previous comment) would be for p4c-bm2-ss to reject such a program as "too complex", and suggest the developer change the code so that checksum calculations are done only with lists of fields they have defined, not expressions.

Attached is a small P4_16 program in a single file, and the BMv2 JSON file produced by the latest version of p4c as of 2019-Mar-06, that causes simple_switch to give the same error message.

fancy-checksum.p4.txt fancy-checksum.json.txt

For P4 developers hitting this issue before the problem is addressed, there appears to be a straightforward workaround: in lists of fields given to check or update checksums, only give constants or field names without any operators, not even a cast operation.

hesingh commented 5 years ago

This is an example command for how to generate json using p4c-bm2-ss:

./p4c-bm2-ss -o bmv2/testdata/p4_16_samples/issue561-bmv2.json /home/hemant/int/p4c/testdata/p4_16_samples/issue561-bmv2.p4
hesingh commented 5 years ago

Even I was confused by so many changes in gitlab and could not see how to reproduce the issue with many files. However, now that the issue has been fixed, I tested what I had come up with first when I saw the issue. With Mihai's fix available in my p4c, these are two alternatives to use for the bit<32> length that compile

  1. {16w0, hdr.ipv6.payload_length} EDIT: Choice 2 below is not valid since it's a computation inside checksum. On creating a new test P4 program, p4c generates an error.
  2. {16w0 ++ hdr.ipv6.payload_length}

The first case is shown with the full line of code below.

update_checksum_with_payload(meta.do_cksum == 1, { hdr.ipv6.src_addr, hdr.ipv6.dst_addr, {16w0,  hdr.ipv6.payload_length}, 24w0, PROTO_ICMP6 }, hdr.icmp6.checksum, HashAlgorithm.csum16);