Open margnus1 opened 6 years ago
First of all, adding many benchmarks without first looking into them and discussing their suitability is a bit pointless. For example, SV-COMP/gcd.c
contains __VERIFIER_nondet_uint
. This is not good. For starters, this is not a self-contained benchmark - it relies on tools implementing this function somewhere. Not all tools do. Also, arguably, this is not OK for stateless model checkers. They fundamentally cannot handle data non-determinism in a proper way. IMO, these "benchmarks" should be taken out.
Also, I think it's important that the script only executes benchmarks that work under all tools (and swsc
in particular). With the current set of benchmarks, swsc
--my version of it at least-- crashes on e.g. SV-COMP/gcd.c
(the first benchmark, due to the above problem??) and seems to be going into an infinite loop on the second one (SV-comp/fibonacci.c
). I gave up after that.
In my opinion, all benchmarks for which swscc --wsc
currently crashes/loops/etc. should be moved to (cooking) tests instead, so that we can see the current picture without too much noise.
Hi.
The benchmarks contain some programs from different sources. They are constructed with the purpose that we (inside people) can have some information about the performance of swsc
with other tools.
I will review the gcd
benchmark.
Swsc
is crashing in some benchmarks because we are still working to handle mutexes. It also can return wrong numbers of traces when the benchmarks have mutexes. Magnus has a branch for mutexes. I believe after Magnus releases a new version, swsc
will not crash.
Currently, some benchmarks are slow with swsc
(e.g Fibonacci). If you set the timeout to 2 mins, the slow benchmarks will be stopped. I can see that Nidhugg and Nidhugg with observers can handle Fibonacci in a reasonable time (less than 20 seconds in my machine).
/Phong
I checked the gcd
benchmark. Although not all of tools support __VERIFIER_nondet_uint
, the benchmark is well supported by Nidhugg and Nidhugg with observers. The benchmarks can be verified by them in less than 5 seconds in my machine.
In the case when we would like to compare with more tools (that are not Nidhugg and Nidhugg with observers), we can comment to skip this benchmark in the script runall.sh
I checked the gcd benchmark. Although not all of tools support __VERIFIER_nondet_uint, the benchmark is well supported by Nidhugg and Nidhugg with observers. The benchmarks can be verified by them in less than 5 seconds in my machine.
No, they cannot. As I wrote above, to veriify this benchmark one needs to use a tool that properly handles data non-determinism using some appropriate technique (e.g., a symbolic one). Nidhugg currently does not handle that. All that you observe is that Nidhugg finishes in less than 5 secs on this benchmark. But not much can be inferred from that.
In any case, I do not want to spend more time arguing here. Can I please ask that we get a reduced set of benchmarks (those that the tools can properly handle) and scripts that actually work? The current ones do not. When I run runall.sh
, I get a crash:
Traceback (most recent call last): File "./gen_table.py", line 96, inmain() File "./gen_table.py", line 78, in main get_result(tst, LOGFILE, "nidhugg") File "/home/kostis/swsc/benchmarks/mylibrary.py", line 40, in get_result with open(file,'r') as content_file: FileNotFoundError: [Errno 2] No such file or directory: '/home/kostis/swsc/benchmarks/log-files/runall.log' cat: ./result-table.txt: No such file or directory
Does the script work for you?
Good morning!
I forgot to add a log file folder. This is the reason why you could not run the script. I have added the folder.
I have commented gcd
and some other benchmarks. I can fix the gcd
benchmark to remove nondeterminism data, but I only do it later.
Please run:
cd SWSC_FOLDER_PATH/benchmarks
where SWSC_FOLDER_PATH is the directory of swsc../runall.sh 1
to check that the script can work (1-second timeout for each benchmark)../runall.sh 600
or ./runall.sh
to run all benchmarks with Nidhugg, Nidhugg observers, and swsc using 10-mins timeout for each benchmark. It can take time so please do not spend your time staying with the computer.benchmarks/result-table.txt
. A log file can be seen in benchmarks/log-files
folder. Thanks for this change. Now we are getting somewhere.
I've pushed a prettification change to the gen_table.py
script.
I have added a folder
benchmarks
that contains several benchmarks from different tools. To run them, we can run the scriptrunall.sh
when we are in the benchmark folder. The script will print out a table at the end. The default timeout is 10 mins. If we would like to run with a different timeout, run./runall.sh num
wherenum
is a number of seconds, e.g 600 for 10 mins.Fore more information, see 542a721d2a08a53fd4f3c071639b6c4066713597