ucb-bar / riscv-torture

RISC-V Torture Test
163 stars 47 forks source link

Bad coverage of compressed ISA #7

Open cliffordwolf opened 8 years ago

cliffordwolf commented 8 years ago

I'm using riscv-torture to test my RV32IC implementation. For this I create RV32I test cases and build them with -march=RV32IC. See https://github.com/cliffordwolf/picorv32/tree/master/scripts/torture for my test setup (config settings in riscv-torture-rv32.diff).

For the most part the generated code does not map to compressed instructions. The only exception is an occasional slli instruction. (The .word at the end of a pseg often maps to some random compressed insns, but they are never executed, so that does not count.) Most psegs look more or less like this:

000001cc <pseg_70>:
     1cc:       00100393                li      t2,1
     1d0:       4067bc13                sltiu   s8,a5,1030
     1d4:       03fe                    slli    t2,t2,0x1f
     1d6:       6c300613                li      a2,1731
     1da:       7c360593                addi    a1,a2,1987
     1de:       00b66463                bltu    a2,a1,1e6 <pseg_70+0x1a>
     1e2:       5f60206f                j       27d8 <crash_forward>
     1e6:       00100d13                li      s10,1
     1ea:       fff00493                li      s1,-1
     1ee:       00420fb3                add     t6,tp,tp
     1f2:       3e8c99e3                bne     s9,s0,de4 <pseg_71>
     1f6:       f7d47537                lui     a0,0xf7d47

For testing purposes I've removed the .word sections at the end of the psegs so I am left with something that only contains "real" instructions. The example I am looking at right now contains 2485 instructions. Only 82 of those instructions (about 3%) are compressed, and almost all of them are slli instructions:

  2 add
  3 ebreak
  4 j
  1 or
 69 slli
  3 xor

The ebreak insns are part of my RVTEST_FAIL / RVTEST_PASS macros. Two of the four j insns are part of the "static" frame that is included in every test case. So not counting the slli insns, this whole test case effectively contains 8 compressed instructions (about 0.3%).

Would it be possible to add a "compressed" option to riscv-torture that makes riscv-torture select register / immediate combinations that map to compressed insns with a much higher probability? Maybe randomly switching between the "compressed" probabilities and the current behavior on a per-pseg basis?

aswaterman commented 8 years ago

The coverage should be improved. Looking at the code, the main reasons RVC instructions aren't being selected is that the register specifiers and immediates aren't in range.

Since only 1/4 of the registers are available to many RVC instructions, instructions that take two register specifiers already only have a 1/16 chance of being selected. For others, RVC instructions are only available if the source and destination match, a 1/32 chance.

Likewise, uniform-random immediates are almost always out of range.

The solution is to bias the register and immediate selector. I actually don't think this should be a new option to torture, just a baked-in feature.

ccelio commented 8 years ago

I agree with Andrew that it's be nice if we can accomplish this by improving the randomizer (it already understands biasing).

But We'll have to see if it can get us all the way there. For example, I'm really surprised you're not seeing compressed branches. I'm pretty sure we have no control over the immediate offset of them though. :(

We're also going to want to make sure the stats torture spits out include the compressed data, which I'm pretty sure it doesn't. There's a lot of unique cases in RVC and we'll want to make sure we hit them at least sometimes.

ccelio commented 8 years ago

Clifford,

I suspect that we probably won't be able to find the cycles to address this issue in the near future. However, you may be able to find some low-hanging fruit by exploring generator/src/main/scala/Rand.scala, and you can try adding more hand-crafted sequences to places like SeqALU.scala (e.g., adding more sequences that share common register operands).

That should at least be able to greatly improve the compressed ALU-op coverage. Let us know how it goes.

cliffordwolf commented 8 years ago

Jfyi: In addition to riscv-torture I'm now using csmith to generate test cases that I compile using riscv gcc. It's still under construction, but if anyone is interested, here are the scripts I'm using for the csmith-based tests:

https://github.com/cliffordwolf/picorv32/tree/master/scripts/csmith

ccelio commented 8 years ago

That looks very interesting; keep us in the loop on how it pans out.

cliffordwolf commented 8 years ago

jfyi: https://github.com/csmith-project/csmith/issues/34

cliffordwolf commented 8 years ago

I now have a working setup with csmith. Jfyi, here is what I'm doing:

First I use csmith to create a test case (my CPU is RV32, so I create a platform.info file before running csmith that reflects that fact):

echo "integer size = 4" > platform.info
echo "pointer size = 4" >> platform.info
csmith --no-packed-struct -o test.c

Then I compile the test case with my host gcc (with -m32) and run it. Usually this prints something like checksum = CFC47D24 almost immediately. But sometimes it just hangs. I'm assuming this might be a gcc bug exposed by csmith maybe. Anyway, if the binary does not finish in less than 2 seconds CPU time then I simply discard the test case and restart the process with a new one.

Next I build a RISCV32 ELF file (using newlib, my own syscalls.c and a simple "boot loader" at address 0 that does things like setting the stack pointer and then jump to the newlib entry point at 0x10000). I then run this binary in a patched version of spike. If it takes more than 1000000 instructions to execute the test case then I discard it and restart with another test case. If running it in spike yields a different checksum than the version built with my host gcc, then I also discard it.

Finally I run the same ELF file in a verilator model of my PicoRV32 core and compare the generated checksums. I terminate with an error if the checksum is different from what I got with spike and with the binary built with the host gcc.

On average it takes less than two seconds to generate and process one test case on my machine with this procedure.

Here is a typical test case as generated by csmith: http://scratch.clifford.at/test_2146419290.c

I discard maybe 20% of all test cases generated by csmith. Almost all of them because the binary built by the host gcc already hangs. Occasionally I discard one because it takes more than 1000000 instructions to complete them in spike. If there is a bug in e.g. riscv-gcc I would simply discard the test case using this methodology. But it should be fairly easy to build something similar that is using csmith and spike to look for bugs in riscv-gcc.

I ran a few 1000 cycles of this procedure now (see make loop in my scripts/csmith/). So far I have not found a bug in my processor. (I have tested the error path by introducing a bug into my processor, just to be sure it's actually working.)

A while back you wrote

[..] we probably won't be able to find the cycles to address this issue in the near future. [..]

For me this is now a solved issue because with csmith I get a good coverage of insn patterns that gcc will actually produce. In addition to what I do with riscv-torture this gives me high confidence that my core is working as expected.

So please feel free to close this issue if you feel like it.

ccelio commented 8 years ago

Thanks for the writeup on csmith. The more disparate tools to test our processors the better.

I'm leaving this issue open because it's still an issue that should be addressed, even if it languishes for a good while.