researchart / re19

control repo for the re19 artifacts evaluation committee
BSD 2-Clause "Simplified" License
3 stars 2 forks source link

Review of submission 126dietsch #16

Closed neilernst closed 5 years ago

re19ar commented 5 years ago

I am trying to execute ./run-experiments.sh on the virtual machine @danieldietsch provided, and got the error message:

# ./run-experiments.sh
####### Running benchmark req2ta2UPPAAL_part1_to_3 ######
Using 0 threads
2019-07-09 17:37:16 - ERROR - At least ONE thread must be given!
WARNING: No file matches 'results/benchexec_def_req2ta2UPPAAL_part1_to_3.*.results.req2ta2UPPAAL.xml'.
ERROR: No benchmark results found.
grep: results/benchexec_def_req2ta2UPPAAL_part1_to_3.*.results.*.csv: No such file or directory
grep: results/benchexec_def_req2ta2UPPAAL_part1_to_3.*.results.*.csv: No such file or directory
### req2ta2UPPAAL ###
ID   rt-inc.     Time 
*         

####### Running benchmark ultimate_reqanalyzer_part1_to_3 ######
Using 0 threads
2019-07-09 17:37:16 - ERROR - At least ONE thread must be given!
WARNING: No file matches 'results/benchexec_def_ultimate_reqanalyzer_part1_to_3.*.results.ultimate_reqanalyzer.xml'.
ERROR: No benchmark results found.
grep: results/benchexec_def_ultimate_reqanalyzer_part1_to_3.*.results.*.csv: No such file or directory
grep: results/benchexec_def_ultimate_reqanalyzer_part1_to_3.*.results.*.csv: No such file or directory
grep: results/benchexec_def_ultimate_reqanalyzer_part1_to_3.*.logfiles/*: No such file or directory
grep: results/benchexec_def_ultimate_reqanalyzer_part1_to_3.*.logfiles/*: No such file or directory
grep: results/benchexec_def_ultimate_reqanalyzer_part1_to_3.*.logfiles/*: No such file or directory
### ultimate_reqanalyzer ###
ID   Vac.    rt-inc.     TO      Time 
*     ()      ()      ()      

Any ideas on the reason and fix?

danieldietsch commented 5 years ago

Oh, perhaps you are using only 1 core? You need at least 2.

re19ar commented 5 years ago

The virtual machine is set up using 4 core. Any additional setups do I need to do?

Oh, perhaps you are using only 1 core? You need at least 2.

danieldietsch commented 5 years ago

Ok, I am very sorry for this, it seems there are still some issues with the VM and the script.

  1. The script has an of-by-one error for the VMs memory. Please insert the line mem_in_gibi=$((mem_in_gibi + 1)) after the initial declaration of mem_in_gibi in the beginning of the script. It should then look like this:

!/bin/bash

This script runs the experiments presented in "Scalable Analysis of Real-Time Requirements", RE 2019, Section VII "EVALUATION AND APPLICATION" and produces the results of Table 1 and Table 2.

The "benchmarks" array controls which benchmarks this script will run

If you have access to a machine with 32 cores (usually 16 physical cores and with HT 32) and 128GB memory, you can expect the following runtimes.

* part1_to_3 specifies parts 1 to 3 of Table 1 and should run fairly fast (approx. 30min)

* part4 has a timeout of 9000s and will take that time for req2ta2UPPAAL

* part5 is the complete Table 2 and will take approx. 12h

benchmarks=( req2ta2UPPAAL_part1_to_3 ultimate_reqanalyzer_part1_to_3

req2ta2UPPAAL_part4

ultimate_reqanalyzer_part4

ultimate_reqanalyzer_part5

) log_file="eval-results.log"

xml_tmp_dir="req2ta-tmp-output" number_of_cores=$(getconf _NPROCESSORS_ONLN) mem_in_gibi=$(free -g|awk '/^Mem:/{print $2}') mem_in_gibi=$((mem_in_gibi + 1))

#############################

Functions

############################# ...


2. It seems like numerous cgroups features are not correctly preserved in the VM after import. Before executing the script you have to run the following commands:
````bash
sudo chmod o+wt '/sys/fs/cgroup/cpuset/'
sudo chmod o+wt '/sys/fs/cgroup/cpu,cpuacct/user.slice'
sudo chmod o+wt '/sys/fs/cgroup/freezer/'
sudo chmod o+wt '/sys/fs/cgroup/memory/user.slice/user-1000.slice/user@1000.service'
sudo swapoff -a

But then it works ;)

re19ar commented 5 years ago

But then it works ;)

Yes, it fixes the problem. Now it runs. Thanks!

re19ar commented 5 years ago

This submission contains all of the required documentation for the package, i.e. a README, STATUS, LICENSE, INSTALL file. The Status file indicates that the authors are pursuing an AVAILABLE badge. The data are publicly available on Zenodo with DOI. The license is LGPLv3 compatible, which fulfills the OSS license requirement to be marked as available.

Most of the content that should appear in INSTALL.md is in README .md currently. However, this doesn’t affect preparing the environment for running the experiment. I experienced a small issue during setup, but it’s resolved by the author’s respond on GitHub. The author should update the corresponding artifacts and consider reorganizing the content to the expected place upon acceptance.

One folder in the artifact is newer than the one in the paper. The author makes this clear in the README, and explained the differences between the old and new results in detail.

I was able to follow the steps to run the experiment Part1 to 3 for Table 1 in their paper. Given I only used two cores, the observed result was different from the ones reported by the author. However, the general trend is the same. Since there’s not enough memory in my machine, I was not able to run the rest of the experiment. But artifacts for running all the experiment are available.

In summary, I recommend giving the artifact an AVAILABLE badge.

neilernst commented 5 years ago

@timm do you concur on Available?

danieldietsch commented 5 years ago

I just realized that one can apply for two badges -- contrary to the wording on http://re19.ajou.ac.kr/pages/submission/artifact_papers/ , where it states that one should apply for one badge: "one of reusable, available, replicated, reproduced".

Is it too late to also ask for the reusable badge?

re19ar commented 5 years ago

I think this dataset has the potential to be granted a "Reusable" badge, but not based on the current submission. For the "Reusable" badge, the artifacts need to be "very carefully documented and well-structured to the extent that reuse and repurposing is facilitated" (reference here). Plus the review period is closed for asking a second reviewer. @neilernst @timm What's your view on this request?

timm commented 5 years ago

our review process is done. the authors should have asked for 2 badges.

that said, we (@timm and @neilernst) could have done a better job of saying in the CFP that multiple badges are possible.

lesson learned. will do so in future

danieldietsch commented 5 years ago

@timm I understand. Perhaps next time. @re19ar Can you elaborate on why you think our contribution is not "reusable" so that we might improve in future iterations? In particular, our contribution provides multiple complete sets of requirements directly from industry (albeit anonymised). We did also go to great lengths to allow for repeatability and replicability by not only providing exact versions of all the used software, but also using state-of-the-art measurement and benchmarking tools to ensure a maximum of precision during reproduction.

re19ar commented 5 years ago

In particular, our contribution provides multiple complete sets of requirements directly from industry (albeit anonymised). We did also go to great lengths to allow for repeatability and replicability by not only providing exact versions of all the used software, but also using state-of-the-art measurement and benchmarking tools to ensure a maximum of precision during reproduction.

I agree and I think the potential users of your artifact would appreciate your effort. The suggestion I have towards more reusable is to improve the documentation for your tool ULTIMATE REQANALYZER. Finding the tool and their usage is not intuitive giving the current explanation in README:

UAutomizer-linux/ contains Ultimate ReqAnalyzer 0.1.24-4f1d294 (i.e., our implementation of the method described in our paper), which is based on the Ultimate program analysis framework. Note that this is a newer version than the one in the paper.

It would be really helpful if you can clearly tell the readers of your paper how to reuse Req2Pea and/or Pea2Boogie in other settings to support better reusability.

danieldietsch commented 5 years ago

@re19ar Thank you for clarifying. You are right, we do not explain the tool usage in sufficient detail. We hope that we have adequate documentation in future iterations.