Closed minkull closed 4 years ago
Note to reviewers: these authors want multiple badges
@ai4se @sangkilc
I followed the instructions described in GitHub repository (https://github.com/researchart/rose6icse/tree/master/submissions/available/Ankou) to install Ankou on docker container running ubuntu 18.04. There are some problems when trying to reproduce the results:
As mentioned in the paper, authors obtained 150 different subjects from 24 packages and randomly selected one subject per package to form the benchmark (24 subjects in total) when evaluating Ankou on its impact of dimensionality reduction (RQ1 in Sec 6.2) and the necessity of distance-based fitness function (RQ3 in Sec 6.4). Table 1 illustrates all the experimental results. The 24 selected subjects used at this point are required to produce the results in Table 1. However, detailed information on which subject is randomly selected in the evaluation is not provided.
README.md
is too simple to follow.The first step is to compile the 24 packages mentioned in the paper at the same version or commit using afl-gcc
. None of these 24 package are provided. Collecting these packages online costs too much for reviewers. Also since docker container is used for fuzzing, it would be better if a dockerfile
is available to set up the environment and compile all packages instead of leaving these steps to reviewers. The second step is to run the produced subjects with commands found in configuration.json
. So reviewer still needs to convert this json file to 150 separate commands and run them in a docker container. A shell script should be provided. The third step is to analyze the output directory for results. The problem here is the statistics file of fuzzing campaign in $OUTPUT_DIR/status*
is a bit too messy for reviewers to analyze. No detailed information or script to facilitate the analysis. Furthermore, reviewers cannot get coverage and throughput information from the output.
safe stack hash
to triage crashes.To evaluate the number of bugs found, crashes can be easily found from $OUTPUT_DIR/crashes-*
directory. However, only unique bugs found by Ankou are listed in Table 2. As mentioned in Sec 6.7, authors decided to use safe stack hash
to triage multiple crashes. Thus, without the detailed information about how to use safe stack hash, there is no way for reviewers to count the numbers of bugs based on the crash information.
@ai4se Could you shed some lights on the issue mentioned above?
@sangkilc : please reply to the above.
@random-friendly-dude : I look forward to your review
@timm Hi Tim, I do have provided my review..
Thanks for the review. @Jiliac is preparing for benchmark programs. He will respond to this thread ASAP.
Thanks for the review. First of all, I am sorry the evaluation description was so brief. I didn't understand the reusable and available badges were also about reproducing the paper experiments.
We will add the 24 subjects of RQ1 and RQ3, and the steps to reproduce the safe stack hash triage we used by next Wednesday. Concerning the execution of the 150 subjects and their evaluation, we now provide the source of all the packages at the right version at: https://github.com/SoftSec-KAIST/Ankou-Benchmark. We will also add indications on what statistics from $OUTPUT_DIR/status*
was used to make each data point for the RQs by Wednesday. However, producing a full-fledged benchmark with all subjects pre-compiled and ready to run would take too much time.
@sec365 looking forward to your review
I agree with the first reviewer.
I am able to compile and run Ankou on binutils following the authors’ instructions. However, there is no information regarding where to download and compile the 24 packages mentioned in the paper. I also find it difficult to interpret the results in Ankou output directory.
I believe the current version does not satisfy the criteria of “available” and “reusable”.
====available==== Only the fuzzer is available. I suggest the authors make the evaluation subjects available as well. Otherwise, it is difficult to judge whether the tool is “functional”.
====reusable==== To facilitate reuse, the authors should provide more detailed instructions on how to set seeds and interpret the output of Ankou.
I see that the authors are trying to improve the submission. I will be happy to review the revised version.
benchmark/rq1_rq3.json
.README
contains instructions to install and run one package, and the Dockerfile
builds an image able with two packages compiled. The README
also explains how to interpret status*
files to get the branch coverage, throughput, and the "effectiveness" metric used in RQ2.triage/
folder contains scripts to obtain the stack hash. The last section of the README
give the command to use them.Sorry for the brevity of the previous comment. We understand the previous submission was missing enough details to be able to reproduce the evaluation. Thus, we updated our repositories with new README
.
We could have provided a single script that completely rebuilds every subject and redo our experiments, but it will simply take months to finish, which is simply infeasible to evaluate. So instead, we provide a Dockerfile
that can automatically set up the whole environment including two packages in our benchmark. The two packages were chosen based on how fast we can observe the first crash using Ankou. This is to ease evaluating our tool without having to wait for hours to produce crashes. Plus, we provide detailed instructions on how to interpret our results.
In case you want to try more packages in our benchmark, we provide the list of subjects we used as well as their arguments in benchmark/configuration.json
. We hope this version makes sense, and we will be happy to answer more questions if you have any.
@sec365
@random-friendly-dude
please advise
@timm
Thanks. We are working on it according to the latest instruction provided by the authors. Will update today.
The authors have addressed my earlier concerns.
The authors have made the 24 packages available at https://github.com/SoftSec-KAIST/Ankou-Benchmark. I tried to compile cflow. It was successful.
The authors also gave more instructions. Following the instructions at https://github.com/SoftSec-KAIST/Ankou, I successfully ran Ankou to fuzz cflow with the provided seeds. I let the tool run for about an hour and it detected 175 crashes. I was able to print the branch coverage, throughput, and effectiveness values with the provided python commands. I then ran cflow on a randomly chosen crashing input. I was able to use the scripts in the triage folder to obtain the stack hash.
The docker image can also be successfully built. The authors have set up the environment. I can run Ankou easily using the image.
Therefore, I am happy to recommend the two badges.
One suggestion: please consider releasing the artifacts at Zenodo (or similar services) as one archive and provide a DOI. Currently, they are released at separate repositories.
I followed the instruction steps described in the GitHub repository: https://github.com/SoftSec-KAIST/Ankou. It provides installation and evaluation steps of Ankou
. I successfully built this tool using the provided command on my machine Ubuntu 18.04.
First, I compiled the source of the 24 program packages with afl-gcc
based on the commands provided at https://github.com/SoftSec-KAIST/Ankou-Benchmark.
CC=afl-gcc CXX=afl-g++ ./configure --prefix=pwd
/build
make -j
make install
cmake .. \
-DCMAKE_INSTALL_PREFIX=./locals \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_C_COMPILER=afl-gcc -DCMAKE_CXX_COMPILER=afl-g++
make -j
make install
Next, I parsed the json file benchmark/rq1_rq3.json
using the following python script to reproduce the results of RQ1 and RQ3 in which Ankou
is evaluated on its impact of dimensionality reduction and the necessity of distance-based fitness function.
import json
if name == 'main':
data = {}
with open('rq1_rq3.json', 'r+') as outfile:
data = json.load(outfile)
for put in data['puts']:
bin_idx = put['bin_path'].rfind('/')
bin_path = put['bin_path'][bin_idx+1:]
seeds_path = ' -i seeds'
args = ' -args' + ' \"' + ' '.join(put['args']) + '\"'
output_path = ' -o ' + bin_path + '_out'
log_path = bin_path + '_log.txt'
cmd = 'go run github.com/SoftSec-KAIST/Ankou -app ' + bin_path + args + seeds_path + ' -threads 1' \
output_path + ' -dur 18h > ' + log_path + ' &' print(cmd) With the help of the python script, I got 24 commands to fuzz the 24 subjects with provided seeds. After fuzzing 24 hrs, I followed the scripts to analyze the results and get the branch coverage, overall throughput and effectiveness of the dynamic PCA. ➜ python -c "print(open('receiver.csv').readlines()[-1].split(',')[0])" 13317 ➜ python -c "last = open('seed_manager.csv').readlines()[-1].split(','); print(float(last[5])/int(last[6]))" 122.034280307 ➜ python -c "last = open('receiver.csv').readlines()[-1].split(','); print('{}%'.format(100-100*float(last[2])/float(last[1])))" 79.8460800428% Please find all the fuzzing results from the following table. | Subject | Crashes | Branch | Throughput | Effectiveness |
---|---|---|---|---|---|
exifvalue | 2251 | 13317 | 122.034280307 | 79.8460800428% | |
sassc | 84 | 37448 | 112.429141626 | 54.5850970816% | |
pspp | 0 | 4577 | 30.2649983446 | 70.0927292126% | |
bison | 709 | 11070 | 25.195571745 | 76.1291791496% | |
cflow | 1520 | 2915 | 123.314212801 | 73.5590002317% | |
avprobe | 53 | 24968 | 28.750011498 | 75.3718317888% | |
asn1Coding | 0 | 1276 | 128.875301479 | 71.7316440233% | |
listaction_d | 380 | 9782 | 116.453881258 | 77.0281168827% | |
tiffinfo | 0 | 7091 | 204.319725339 | 71.3228179181% | |
toe | 1225 | 2358 | 72.2312728432 | 73.3090398555% | |
tcpdump | 0 | 10198 | 142.976365184 | 54.3083022314% | |
clambc | 913 | 8146 | 124.122272853 | 67.3087975654% | |
dwarfdump | 51 | 9493 | 201.101765329 | 81.2333813153% | |
dump_torrent | 0 | 2311 | 116.031338146 | 58.8446577037% | |
nasm | 4 | 6441 | 64.9238600896 | 68.9811902784% | |
vim | 1130 | 47348 | 25.2132525681 | 62.449447189% | |
catdoc | 1 | 767 | 129.388910667 | 51.9245235833% | |
xpstopdf | 110 | 1238 | 52.6846275753 | 44.1574358861% | |
mpg123 | 0 | 4911 | 39.3765910061 | 74.4297278709% | |
dcraw_half | 0 | 5338 | 111.019220114 | 67.5267171476% | |
lou_trace | 0 | 3614 | 69.549948043 | 72.5370711436% | |
gm | 1 | 18190 | 99.9096913502 | 62.5383348125% | |
jasper | 1072 | 9618 | 199.767184035 | 79.9024780275% | |
cxxfilt | 0 | 3270 | 187.97047608 | 62.34051213% |
I compared the evaluation results with the numbers shown in paper (mainly Table 1 and Figure 4). The branch coverage and overall throughput are almost the same with acceptable differences. The effectiveness of the dynamic PCA is around 70% which is below 80% mentioned in Sec 6.3 of paper. I think it's because I only evaluate 24 subjects out of 150 subjects in total. Based the crashes, the stack hashes can be easily found following the setup steps.
The author provided enough and detailed instructions to build and evaluate its tool Ankou
. All steps can be done smoothly. The evaluate results are also quite promising. I believe that Ankou
are reusable and available.
https://github.com/researchart/rose6icse/tree/master/submissions/available/Ankou https://github.com/researchart/rose6icse/tree/master/submissions/reusable/Ankou
seeking Reusable and Available Badges