google / fuzzbench

FuzzBench - Fuzzer benchmarking as a service.
https://google.github.io/fuzzbench/
Apache License 2.0
1.11k stars 269 forks source link

Fox experiment #1957

Open prashast opened 8 months ago

prashast commented 8 months ago

Hi, we want to reevaluate an optimized reconfiguration of Fox, comparing it against aflplusplus, libafl, and the other SBFT'24 fuzzers as well, if possible. I've added the relevant SBFT'24 fuzzers in my PR along with a dummy comment and would appreciate it if the below-specified experiment could be run. The command to run the experiment is:

/gcbrun run_experiment.py -a --experiment-config /opt/fuzzbench/service/experiment-config.yaml --experiment-name fox-eval-update --fuzzers fox aflplusplus libafl mystique bandfuzz tunefuzz pastis

As a note, I know the other candidate fuzzers we are comparing our updated version of Fox against have previously been run on the fuzzbench set, so if you think merging the results that we will get for the updated Fox as part of the current experiment with the previous experimental results of these candidate fuzzers on fuzzbench will be easier, that'll be fine with us too.

prashast commented 8 months ago

@DonggeLiu Just wanted to do a quick check-in and see when the above experiment could be run and if there is anything else needed from my end, thanks! :)

DonggeLiu commented 8 months ago

@DonggeLiu Just wanted to do a quick check-in and see when the above experiment could be run and if there is anything else needed from my end, thanks! :)

Thanks for the reminder @prashast and sorry about missing the PR. Please always feel free to @ me in all experiment requests : ) Unfortunately, we are holding all experiments for two days (Monday-Tuesday, 4-5 Mar) for another task. Would you mind if I come back to this on Wednesday? My apologies for the inconvenience.

prashast commented 8 months ago

@DonggeLiu Yeah sure, Wed (Mar 6) works for me, thanks!

DonggeLiu commented 8 months ago

The migration is taking a bit longer than expected. I will come back to this once it finishes (1-2 days).

For now, I've enabled CIs to capture potential fuzzer-benchmark compatibility issues (if any) : ) Sorry about the delay!

prashast commented 8 months ago

All good, thanks for the update!

prashast commented 8 months ago

Hi Dongge, I looked at the CI failures and they were emitted from BandFuzz (a SBFT'24 fuzzer) being run. Looking at the logs, the docker image they had used in the competition is no longer public so it can't be pulled. Do you think we can only re-run fox and then merge its results with that of the other fuzzers that have been previously re-run on the fuzzbench dataset which are: aflplusplus libafl mystique bandfuzz tunefuzz pastis? A command for only running fox is below:

/gcbrun run_experiment.py -a --experiment-config /opt/fuzzbench/service/experiment-config.yaml --experiment-name fox-eval-update --fuzzers fox

Is it possible to do a CI run just for fox? If you think merging results for other fuzzers from prior run is viable and if you need me to update the PR so that only fox is added and CI run for, then I can do that too.

DonggeLiu commented 8 months ago

/gcbrun run_experiment.py -a --experiment-config /opt/fuzzbench/service/experiment-config.yaml --experiment-name 2024-03-11-fox-eval-update --fuzzers fox

DonggeLiu commented 8 months ago

Experiment 2024-03-11-fox-eval-update data and results will be available later at: The experiment data. The experiment report.

DonggeLiu commented 8 months ago

Do you think we can only re-run fox and then merge its results with that of the other fuzzers that have been previously re-run on the fuzzbench dataset which are: aflplusplus libafl mystique bandfuzz tunefuzz pastis?

FuzzBench should be able to merge the previous results of core fuzzers into the report and compare them with yours. I am not sure if adding the fuzzers you requested to this file can make FuzzBench automatically include them too, but I am happy to run a quick experiment to test this hack, if you'd like to add them to core-fuzzers.yaml.

Alternatively, at the end of each report is a link to download the raw data (data.csv.gz), which can be used to re-generate the report with selected fuzzers and benchmarks. You will need to manually select some old reports and merge your data file with them, though.

We can also re-run experiment with the fuzzers, the main problems are:

  1. Bandfuzz is not avaialble.
  2. We cannot run many fuzzers (e.g., > 8) in one experiment because of a bottleneck in measurement, which should be OK for your case.

Please let me know which one you would prefer : )