mit-plv / riscv-semantics

A formal semantics of the RISC-V ISA in Haskell
BSD 3-Clause "New" or "Revised" License
155 stars 16 forks source link

Add support for running RISC-V litmus tests through the HMC model-checking backend #33

Closed pratapsingh1729 closed 2 years ago

pratapsingh1729 commented 2 years ago

This PR adds support for running RISC-V litmus tests through the HMC model-checking backend. Currently, only tests with two threads and postconditions of the form "exists (X /\ Y /\ ... /\ Z)" are supported, and it is known to work with all of the BASIC_2_THREAD tests. To run a test, execute the following:

stack run riscv-mm </path/to/litmus/file>

The BASIC_2_THREAD tests have been added to the directory test/litmus/. This will call my simple compiler (implemented in LitmusToRiscv.hs) to translate the litmus specification into a RISC-V assembly file and extract the condition to check, then use riscv-none-embed-gcc to compile the assembly file into an ELF file, then run the HMC interpreter over that ELF file. For every valid full execution of a file, it will check whether all of the conditions specified in the litmus file postcondition are met; if so, it will print the alloy file generated by that execution for manual inspection. I also added a script run_riscv_mm_litmus.sh to run all the tests and store their output in log files.

The following table summarizes performance and results:

Test file Wallclock time Conditions met? Expect conditions to be met?
2+2W.litmus 2:52.43 Yes Yes
2+2W+fence.rw.rws.litmus 3:03.12 No No
2+2W+fence.rw.rw+po.litmus 2:57.30 Yes Yes
LB.litmus 2:39.31 Yes Yes
LB+ctrls.litmus 2:20.20 No No
LB+ctrl+po.litmus 2:39.62 Yes Yes
LB+datas.litmus 2:22.97 No No
LB+data+ctrl.litmus 2:18.99 No No
LB+data+po.litmus 2:38.70 Yes Yes
LB+fence.rw.rws.litmus 2:40.91 No No
LB+fence.rw.rw+ctrl.litmus 2:12.61 No No
LB+fence.rw.rw+data.litmus 2:25.57 No No
LB+fence.rw.rw+po.litmus 2:44.65 Yes Yes
MP.litmus 0:22.19 Yes Yes
MP+fence.rw.rws.litmus 0:20.80 No No
MP+fence.rw.rw+addr.litmus 0:21.07 No No
MP+fence.rw.rw+ctrl.litmus 0:24.52 Yes Yes
MP+fence.rw.rw+po.litmus 0:24.56 Yes Yes
MP+po+addr.litmus 0:22.40 Yes Yes
MP+po+ctrl.litmus 0:22.42 Yes Yes
MP+po+fence.rw.rw.litmus 0:21.90 Yes Yes
R.litmus 0:50.24 Yes Yes
R+fence.rw.rws.litmus 0:51.41 No No
R+fence.rw.rw+po.litmus 0:48.89 Yes Yes
R+po+fence.rw.rw.litmus 0:51.20 Yes Yes
S.litmus 1:13.26 Yes Yes
SB.litmus 0:22.53 Yes Yes
SB+fence.rw.rws.litmus 0:24.89 No No
SB+fence.rw.rw+po.litmus 0:25.35 Yes Yes
S+fence.rw.rws.litmus 1:13.46 No No
S+fence.rw.rw+ctrl.litmus 0:45.92 No No
S+fence.rw.rw+data.litmus 0:46.18 No No
S+fence.rw.rw+po.litmus 1:11.26 Yes Yes
S+po+ctrl.litmus 0:45.26 Yes Yes
S+po+data.litmus 0:45.40 Yes Yes
S+po+fence.rw.rw.litmus 1:15.50 Yes Yes

Performance is currently limited by the use of SAT4J in the alloy solver, the time taken to start up the JVM for each separate alloy run (can be dozens of alloy runs per litmus test), and the fact that I increased the alloy search depth to 20. One can imagine ways to improve all of these.

The third column indicates whether HMC found an execution satisfying the litmus test postcondition; the fourth column indicates whether the rmem (operational) and herd (axiomatic) models find an execution satisfying the litmus test postcondition (generated by manual inspection of the test log files). I am investigating the two cases where we disagree with the rmem and herd models---I believe there may be an issue in the riscv.als alloy file we are using to specify the RISC-V memory model, but further debugging is required. (EDIT: see https://github.com/daniellustig/riscv-memory-model/issues/1.) EDIT: we were calling Alloy with the wrong kind of fence specification, which I have now fixed. We now match the expected model behavior on all test cases.

pratapsingh1729 commented 2 years ago

The two cases where our model-checker previously disagreed with herd and rmem have now been resolved---we had been calling Alloy with the wrong fence specification, but this is now fixed. I believe this PR should be ready to merge, at least for 2-thread test support. Let me know if you'd like me to change anything in the API or anywhere else!