Write/find more complicated benchmarks

PhongNgo commented 6 years ago

I have added a folder benchmarks that contains several benchmarks from different tools. To run them, we can run the script runall.sh when we are in the benchmark folder. The script will print out a table at the end. The default timeout is 10 mins. If we would like to run with a different timeout, run ./runall.sh num where num is a number of seconds, e.g 600 for 10 mins.

Fore more information, see 542a721d2a08a53fd4f3c071639b6c4066713597

kostis commented 6 years ago

First of all, adding many benchmarks without first looking into them and discussing their suitability is a bit pointless. For example, SV-COMP/gcd.c contains __VERIFIER_nondet_uint. This is not good. For starters, this is not a self-contained benchmark - it relies on tools implementing this function somewhere. Not all tools do. Also, arguably, this is not OK for stateless model checkers. They fundamentally cannot handle data non-determinism in a proper way. IMO, these "benchmarks" should be taken out.

Also, I think it's important that the script only executes benchmarks that work under all tools (and swsc in particular). With the current set of benchmarks, swsc --my version of it at least-- crashes on e.g. SV-COMP/gcd.c (the first benchmark, due to the above problem??) and seems to be going into an infinite loop on the second one (SV-comp/fibonacci.c). I gave up after that.

In my opinion, all benchmarks for which swscc --wsc currently crashes/loops/etc. should be moved to (cooking) tests instead, so that we can see the current picture without too much noise.

PhongNgo commented 6 years ago

Hi.

The benchmarks contain some programs from different sources. They are constructed with the purpose that we (inside people) can have some information about the performance of swsc with other tools.

I will review the gcd benchmark.

Swsc is crashing in some benchmarks because we are still working to handle mutexes. It also can return wrong numbers of traces when the benchmarks have mutexes. Magnus has a branch for mutexes. I believe after Magnus releases a new version, swsc will not crash.

Currently, some benchmarks are slow with swsc (e.g Fibonacci). If you set the timeout to 2 mins, the slow benchmarks will be stopped. I can see that Nidhugg and Nidhugg with observers can handle Fibonacci in a reasonable time (less than 20 seconds in my machine).

/Phong

PhongNgo commented 6 years ago

I checked the gcd benchmark. Although not all of tools support __VERIFIER_nondet_uint, the benchmark is well supported by Nidhugg and Nidhugg with observers. The benchmarks can be verified by them in less than 5 seconds in my machine.

In the case when we would like to compare with more tools (that are not Nidhugg and Nidhugg with observers), we can comment to skip this benchmark in the script runall.sh

kostis commented 6 years ago

I checked the gcd benchmark. Although not all of tools support __VERIFIER_nondet_uint, the benchmark is well supported by Nidhugg and Nidhugg with observers. The benchmarks can be verified by them in less than 5 seconds in my machine.

No, they cannot. As I wrote above, to veriify this benchmark one needs to use a tool that properly handles data non-determinism using some appropriate technique (e.g., a symbolic one). Nidhugg currently does not handle that. All that you observe is that Nidhugg finishes in less than 5 secs on this benchmark. But not much can be inferred from that.

In any case, I do not want to spend more time arguing here. Can I please ask that we get a reduced set of benchmarks (those that the tools can properly handle) and scripts that actually work? The current ones do not. When I run runall.sh, I get a crash:

Traceback (most recent call last):
  File "./gen_table.py", line 96, in 
    main()
  File "./gen_table.py", line 78, in main
    get_result(tst, LOGFILE, "nidhugg")
  File "/home/kostis/swsc/benchmarks/mylibrary.py", line 40, in get_result
    with open(file,'r') as content_file:
FileNotFoundError: [Errno 2] No such file or directory: '/home/kostis/swsc/benchmarks/log-files/runall.log'
cat: ./result-table.txt: No such file or directory

Does the script work for you?

PhongNgo commented 6 years ago

Good morning!

I forgot to add a log file folder. This is the reason why you could not run the script. I have added the folder.

I have commented gcd and some other benchmarks. I can fix the gcd benchmark to remove nondeterminism data, but I only do it later.

Please run:

cd SWSC_FOLDER_PATH/benchmarks where SWSC_FOLDER_PATH is the directory of swsc.
./runall.sh 1 to check that the script can work (1-second timeout for each benchmark).
./runall.sh 600 or ./runall.sh to run all benchmarks with Nidhugg, Nidhugg observers, and swsc using 10-mins timeout for each benchmark. It can take time so please do not spend your time staying with the computer.
A result table can be seen in benchmarks/result-table.txt. A log file can be seen in benchmarks/log-files folder.

kostis commented 6 years ago

Thanks for this change. Now we are getting somewhere. I've pushed a prettification change to the gen_table.py script.

margnus1 / swsc

Write/find more complicated benchmarks #10