researchart / rose6icse

12 stars 71 forks source link

Ankou #94

Closed minkull closed 4 years ago

minkull commented 4 years ago

https://github.com/researchart/rose6icse/tree/master/submissions/available/Ankou https://github.com/researchart/rose6icse/tree/master/submissions/reusable/Ankou

seeking Reusable and Available Badges

minkull commented 4 years ago

Note to reviewers: these authors want multiple badges

ai-sta-website commented 4 years ago

@ai4se @sangkilc

I followed the instructions described in GitHub repository (https://github.com/researchart/rose6icse/tree/master/submissions/available/Ankou) to install Ankou on docker container running ubuntu 18.04. There are some problems when trying to reproduce the results:

No information about the 24 subjects used in RQ1 and RQ3.

As mentioned in the paper, authors obtained 150 different subjects from 24 packages and randomly selected one subject per package to form the benchmark (24 subjects in total) when evaluating Ankou on its impact of dimensionality reduction (RQ1 in Sec 6.2) and the necessity of distance-based fitness function (RQ3 in Sec 6.4). Table 1 illustrates all the experimental results. The 24 selected subjects used at this point are required to produce the results in Table 1. However, detailed information on which subject is randomly selected in the evaluation is not provided.

The evaluation reproduction section in README.md is too simple to follow.

The first step is to compile the 24 packages mentioned in the paper at the same version or commit using afl-gcc. None of these 24 package are provided. Collecting these packages online costs too much for reviewers. Also since docker container is used for fuzzing, it would be better if a dockerfile is available to set up the environment and compile all packages instead of leaving these steps to reviewers. The second step is to run the produced subjects with commands found in configuration.json. So reviewer still needs to convert this json file to 150 separate commands and run them in a docker container. A shell script should be provided. The third step is to analyze the output directory for results. The problem here is the statistics file of fuzzing campaign in $OUTPUT_DIR/status* is a bit too messy for reviewers to analyze. No detailed information or script to facilitate the analysis. Furthermore, reviewers cannot get coverage and throughput information from the output.

No steps to set up safe stack hash to triage crashes.

To evaluate the number of bugs found, crashes can be easily found from $OUTPUT_DIR/crashes-* directory. However, only unique bugs found by Ankou are listed in Table 2. As mentioned in Sec 6.7, authors decided to use safe stack hash to triage multiple crashes. Thus, without the detailed information about how to use safe stack hash, there is no way for reviewers to count the numbers of bugs based on the crash information.

@ai4se Could you shed some lights on the issue mentioned above?

timm commented 4 years ago

@sangkilc : please reply to the above.

@random-friendly-dude : I look forward to your review

ai-sta-website commented 4 years ago

@timm Hi Tim, I do have provided my review..

sangkilc commented 4 years ago

Thanks for the review. @Jiliac is preparing for benchmark programs. He will respond to this thread ASAP.

Jiliac commented 4 years ago

Thanks for the review. First of all, I am sorry the evaluation description was so brief. I didn't understand the reusable and available badges were also about reproducing the paper experiments.

We will add the 24 subjects of RQ1 and RQ3, and the steps to reproduce the safe stack hash triage we used by next Wednesday. Concerning the execution of the 150 subjects and their evaluation, we now provide the source of all the packages at the right version at: https://github.com/SoftSec-KAIST/Ankou-Benchmark. We will also add indications on what statistics from $OUTPUT_DIR/status* was used to make each data point for the RQs by Wednesday. However, producing a full-fledged benchmark with all subjects pre-compiled and ready to run would take too much time.

minkull commented 4 years ago

@sec365 looking forward to your review

sec365 commented 4 years ago

I agree with the first reviewer.

I am able to compile and run Ankou on binutils following the authors’ instructions. However, there is no information regarding where to download and compile the 24 packages mentioned in the paper. I also find it difficult to interpret the results in Ankou output directory.

I believe the current version does not satisfy the criteria of “available” and “reusable”.

====available==== Only the fuzzer is available. I suggest the authors make the evaluation subjects available as well. Otherwise, it is difficult to judge whether the tool is “functional”.

====reusable==== To facilitate reuse, the authors should provide more detailed instructions on how to set seeds and interpret the output of Ankou.

I see that the authors are trying to improve the submission. I will be happy to review the revised version.

Jiliac commented 4 years ago
Jiliac commented 4 years ago

Sorry for the brevity of the previous comment. We understand the previous submission was missing enough details to be able to reproduce the evaluation. Thus, we updated our repositories with new README.

We could have provided a single script that completely rebuilds every subject and redo our experiments, but it will simply take months to finish, which is simply infeasible to evaluate. So instead, we provide a Dockerfile that can automatically set up the whole environment including two packages in our benchmark. The two packages were chosen based on how fast we can observe the first crash using Ankou. This is to ease evaluating our tool without having to wait for hours to produce crashes. Plus, we provide detailed instructions on how to interpret our results.

In case you want to try more packages in our benchmark, we provide the list of subjects we used as well as their arguments in benchmark/configuration.json. We hope this version makes sense, and we will be happy to answer more questions if you have any.

timm commented 4 years ago

@sec365

@random-friendly-dude

please advise

ai-sta-website commented 4 years ago

@timm

Thanks. We are working on it according to the latest instruction provided by the authors. Will update today.

sec365 commented 4 years ago

The authors have addressed my earlier concerns.

  1. The authors have made the 24 packages available at https://github.com/SoftSec-KAIST/Ankou-Benchmark. I tried to compile cflow. It was successful.

  2. The authors also gave more instructions. Following the instructions at https://github.com/SoftSec-KAIST/Ankou, I successfully ran Ankou to fuzz cflow with the provided seeds. I let the tool run for about an hour and it detected 175 crashes. I was able to print the branch coverage, throughput, and effectiveness values with the provided python commands. I then ran cflow on a randomly chosen crashing input. I was able to use the scripts in the triage folder to obtain the stack hash.

  3. The docker image can also be successfully built. The authors have set up the environment. I can run Ankou easily using the image.

Therefore, I am happy to recommend the two badges.

One suggestion: please consider releasing the artifacts at Zenodo (or similar services) as one archive and provide a DOI. Currently, they are released at separate repositories.

ai-sta-website commented 4 years ago

Installation

​ I followed the instruction steps described in the GitHub repository: https://github.com/SoftSec-KAIST/Ankou. It provides installation and evaluation steps of Ankou. I successfully built this tool using the provided command on my machine Ubuntu 18.04. ​

Evaluation

​ First, I compiled the source of the 24 program packages with afl-gcc based on the commands provided at https://github.com/SoftSec-KAIST/Ankou-Benchmark. ​ CC=afl-gcc CXX=afl-g++ ./configure --prefix=pwd/build make -j make install

cmake .. \
    -DCMAKE_INSTALL_PREFIX=./locals \
    -DCMAKE_BUILD_TYPE=Release \
    -DCMAKE_C_COMPILER=afl-gcc -DCMAKE_CXX_COMPILER=afl-g++
make -j
make install

​ Next, I parsed the json file benchmark/rq1_rq3.json using the following python script to reproduce the results of RQ1 and RQ3 in which Ankou is evaluated on its impact of dimensionality reduction and the necessity of distance-based fitness function. ​ import json if name == 'main': data = {} with open('rq1_rq3.json', 'r+') as outfile: data = json.load(outfile) for put in data['puts']: bin_idx = put['bin_path'].rfind('/') bin_path = put['bin_path'][bin_idx+1:] seeds_path = ' -i seeds' args = ' -args' + ' \"' + ' '.join(put['args']) + '\"' output_path = ' -o ' + bin_path + '_out' log_path = bin_path + '_log.txt' cmd = 'go run github.com/SoftSec-KAIST/Ankou -app ' + bin_path + args + seeds_path + ' -threads 1' \

I compared the evaluation results with the numbers shown in paper (mainly Table 1 and Figure 4). The branch coverage and overall throughput are almost the same with acceptable differences. The effectiveness of the dynamic PCA is around 70% which is below 80% mentioned in Sec 6.3 of paper. I think it's because I only evaluate 24 subjects out of 150 subjects in total. Based the crashes, the stack hashes can be easily found following the setup steps. ​

Summary

​ The author provided enough and detailed instructions to build and evaluate its tool Ankou. All steps can be done smoothly. The evaluate results are also quite promising. I believe that Ankou are reusable and available.