Closed catenacyber closed 1 year ago
/gcbrun run_experiment.py -a --experiment-config /opt/fuzzbench/service/experiment-config.yaml --experiment-name 2023-06-15-quadfuzz --fuzzers quadfuzz
The experiment has been launched successfully and its data & report should be available shortly: The experiment data. The experiment report.
Thanks @Alan32Liu What should I make out of this report ? Were new timeouts/bugs found ?
Thanks @Alan32Liu What should I make out of this report ? Were new timeouts/bugs found ?
This is a code coverage report. Once the experiment finishes, the past code coverage results of base fuzzers (libfuzzer
, etc.) will be merged into the report to compare with quandfuzz
.
Bug discovery requires a separate experiment, I can launch that now.
/gcbrun run_experiment.py -a --experiment-config /opt/fuzzbench/service/experiment-config.yaml --experiment-name 2023-06-15-quadfuzz-bug --fuzzers quadfuzz --benchmarks bloaty_fuzz_target_52948c harfbuzz_hb-shape-fuzzer_17863b libxml2_xml_e85b9b mbedtls_fuzz_dtlsclient_7c6b0e php_php-fuzz-parser_0dbedb
Should I do anything here @Alan32Liu ?
For bug-based experiment (2023-06-15-quadfuzz-bug
):
The experiment data.
The experiment report.
Should I do anything here @Alan32Liu ?
FuzzBench generates reports to help researchers compare the performance of their fuzzer against others. FuzzBench also provides the experiment data to help in-depth analysis on the parts the fuzzer did not do well so that researchers can improve it accordingly, or justify which part the fuzzer did perform better.
In this case, the two experiments here evaluate your fuzzer quadfuzz
against some common fuzzers (e.g., afl++
used in OSS-Fuzz, libafl
submitted by researchers) on some benchmark programs from OSS-Fuzz
.
Some quick observation:
quadfuzz
did not outperform the afl++
(currently used in OSS-Fuzz
production). Coverage reports shows afl++
can cover more lines and bug report shows it can find more bugs.quadfuzz
did not cover many branches that afl++
missed, but afl++
did cover many branches that others did not execute.If you are interested, feel free to do more analysis on the performance of quadfuzz
with the experiment data and see if you can make it perform better than afl++
or find some interesting/important branches that all others missed.
Also feel free to ping me if you'd like to request more experiments after improving quadfuzz
: )
If you think the fuzzer will perform better than existing fuzzers on certain types of benchmarks, we can also integrate those benchmarks in this PR and run experiments to prove that.
Benchmark integration is pretty straightforward: the ones in benchmark/
are basically from OSS-Fuzz projects/
.
We can see that both reports show that quadfuzz did not outperform the afl++
This is not the goal of quadfuzz to get good coverage.
Its goal is to use the corpus found by other fuzzing engines such as afl++ et al to find quadratic complexity with big inputs that afl++ et al would not find (because it focuses on speed and small inputs, which makes perfect sense)
For example, it found the bug/slowness fixed here : https://github.com/OISF/suricata/commit/d40dca5e55286c57e9a83018975022c4f08bf6d1
So, it is a niche fuzzer, not a general-purpose one... Does this make sense ?
We can see that both reports show that quadfuzz did not outperform the afl++
This is not the goal of quadfuzz to get good coverage.
Its goal is to use the corpus found by other fuzzing engines such as afl++ et al to find quadratic complexity with big inputs that afl++ et al would not find (because it focuses on speed and small inputs, which makes perfect sense)
For example, it found the bug/slowness fixed here : OISF/suricata@d40dca5
So, it is a niche fuzzer, not a general-purpose one... Does this make sense ?
I see.
Is there any chance that we can prove the benefit of quadfuzz
on more benchmarks?
For example, collect the corpus of afl++
on some projects, then start another experiment using the corpus as seeds for quadfuzz
, afl++
, and other fuzzers. We can compare the coverage/bug-finding results of these fuzzers?
Is there any chance that we can prove the benefit of quadfuzz on more benchmarks? For example, collect the corpus of afl++ on some projects, then start another experiment using the corpus as seeds for quadfuzz, afl++, and other fuzzers. We can compare the coverage/bug-finding results of these fuzzers?
Could we add a benchmark with the suricata fuzz target fuzz_mimedecparseline at suricata commit c56fa0a80564174df8afe172ff722ef48754b405 (just before f80c999db320aa60570b4e04846bd7beeed96cd6) ? There can also be fuzz_applayerparserparse_dcerpc at commit 668501c225d09ce1c5316c0061ff6a7e1980c64c just before d40dca5e55286c57e9a83018975022c4f08bf6d1
By the way, could suricata fuzz target fuzz_predefpcap_aware be part of fuzz bench ? (and could we know if more bugs are found) It has a huge corpus/coverage (problematic as we end up oom because corpus/coverage occupies more than 2Gb)
Is there any chance that we can prove the benefit of quadfuzz on more benchmarks? For example, collect the corpus of afl++ on some projects, then start another experiment using the corpus as seeds for quadfuzz, afl++, and other fuzzers. We can compare the coverage/bug-finding results of these fuzzers?
Could we add a benchmark with the suricata fuzz target fuzz_mimedecparseline at suricata commit c56fa0a80564174df8afe172ff722ef48754b405 (just before f80c999db320aa60570b4e04846bd7beeed96cd6) ? There can also be fuzz_applayerparserparse_dcerpc at commit 668501c225d09ce1c5316c0061ff6a7e1980c64c just before d40dca5e55286c57e9a83018975022c4f08bf6d1
We can certainly add them and run another experiment for quadfuzz
in this PR.
Unfortunately, I have too many things on my plate at this moment and might not be able to do that now.
I can add them when I have free time, but please feel free to add them when you see fit.
By the way, could suricata fuzz target fuzz_predefpcap_aware be part of fuzz bench ? (and could we know if more bugs are found) It has a huge corpus/coverage (problematic as we end up oom because corpus/coverage occupies more than 2Gb)
Similarly, we don't have to add suricata
to FuzzBench run experiments. We can add it to this PR (or another branch from this PR) to test it out first.
Did I get right how to add a benchmark ?
Did I get right how to add a benchmark ?
Ah yes, sorry that I just saw your commit. I think you did it correctly, let's run an experiment with it to find out.
/gcbrun run_experiment.py -a --experiment-config /opt/fuzzbench/service/experiment-config.yaml --experiment-name 2023-07-04-quadfuzz-bug --fuzzers quadfuzz aflplusplus libfuzzer afl honggfuzz libafl centipede --benchmarks suricata_mime_c56fa0a8
Since this is a new benchmark, I also added some top-performing fuzzers to compare against quadfuzz
.
Experiment data and results will be available at:
The experiment data.
The experiment report.
Oh, no fuzzer found the bug...
I guess I need to dig more
Oh, no fuzzer found the bug...
I guess I need to dig more
Feel free to request more experiments if that can assist in digging. FuzzBench is designed to help fuzzer evaluations before using them in production : )
@Alan32Liu there was a typo in the commit hash for the experiment
I pushed a fixup. Could you run the experiment again ?
Thanks
/gcbrun run_experiment.py -a --experiment-config /opt/fuzzbench/service/experiment-config.yaml --experiment-name 2023-07-13-quadfuzz-bug-1 --fuzzers quadfuzz aflplusplus libfuzzer afl honggfuzz libafl centipede --benchmarks suricata_mime_c56fa0a8
Experiment data and results will be available at: The experiment data. The experiment report.
I managed to get the bug once locally with quadfuzz by
- let libFuzzer explore around it and expand the corpus
If we need to run libFuzzer
to expand the corpus, would it make sense to have the expanded corpus as seed, so that all fuzzers share the same starting point?
If we need to run libFuzzer to expand the corpus, would it make sense to have the expanded corpus as seed, so that all fuzzers share the same starting point?
That makes sense.
But I realized my corpus input was wrong because of \n
not being interpreted...
Could you run the experiment with this ?
/gcbrun run_experiment.py -a --experiment-config /opt/fuzzbench/service/experiment-config.yaml --experiment-name 2023-07-20-quadfuzz-bug --fuzzers quadfuzz aflplusplus libfuzzer afl honggfuzz libafl centipede --benchmarks suricata_mime_c56fa0a8
Experiment data and results will be available at: The experiment data. The experiment report.
Looks like the build failed. Tried to fix it...
Hey @Alan32Liu coming back to this, could we run again a benchmark with the build that should have been fixed ?
Hey @Alan32Liu coming back to this, could we run again a benchmark with the build that should have been fixed ?
Sorry, did you mean re-running an experiment? Sure! Would the previous setting work? gcbrun run_experiment.py -a --experiment-config /opt/fuzzbench/service/experiment-config.yaml --experiment-name 2023-07-20-quadfuzz-bug --fuzzers quadfuzz aflplusplus libfuzzer afl honggfuzz libafl centipede --benchmarks suricata_mime_c56fa0a8
Sorry, did you mean re-running an experiment?
Yes :-)
Would the previous setting work?
It should
/gcbrun run_experiment.py -a --experiment-config /opt/fuzzbench/service/experiment-config.yaml --experiment-name 2023-09-08-quadfuzz-bug --fuzzers quadfuzz aflplusplus libfuzzer afl honggfuzz libafl centipede --benchmarks suricata_mime_c56fa0a8
Experiment 2023-09-08-quadfuzz-bug
data and results will be available later at:
The experiment data.
The experiment report.
Thanks, looks like afl now finds the bug (it did not find it on oss-fuzz) but counts it 3 times... quadfuzz finds also the bug and libFuzzer does not.
Do I understand this correctly ?
Thanks, looks like afl now finds the bug (it did not find it on oss-fuzz) but counts it 3 times... quadfuzz finds also the bug and libFuzzer does not.
Do I understand this correctly ?
Sorry that I completely missed this message. You are correct that AFL found the bug (counting 3 times is likely due to our grouping algorithm categorizing the different bug-triggering stack traces into 3 groups). QuadFuzz indeed found the bug, likely with very similar stack traces.
LibFuzzer instances also found the bug after 13 hours, though.
LibFuzzer instances also found the bug after 13 hours, though.
Oh... How come it did not find the bug in 18 monts on oss-fuzz ? Did it get improved meanwhile ?
Oh... How come it did not find the bug in 18 monts on oss-fuzz ? Did it get improved meanwhile ?
Not sure if it has improved, but it might be due to changes we made in this PR (e.g., the seed corpus?)
Oh... How come it did not find the bug in 18 monts on oss-fuzz ? Did it get improved meanwhile ?
Not sure if it has improved, but it might be due to changes we made in this PR (e.g., the seed corpus?)
The seed corpus added was part of the public corpus. The bug was found historically when a bounds check was added, when there was none previously, and this bounds check was not correct in every case...
The seed corpus added was part of the public corpus. The bug was found historically when a bounds check was added, when there was none previously, and this bounds check was not correct in every case...
Interesting. I just checked that libFuzzer was updated recently. Could this be the reason?
If oss-fuzz did not find the bug 2 years ago or so, and now libFuzzer finds it in 13 hours, I find it likely that libFuzzer was improved meanwhile.
Do you have ideas of good projects where I could manually run quadfuzz to see if it finds anything new ? (besides Suricata)
@Alan32Liu I just pushed a new experiment/benchmark to find another bug
/gcbrun run_experiment.py -a --experiment-config /opt/fuzzbench/service/experiment-config.yaml --experiment-name 2023-09-28-quadfuzz-bug --fuzzers quadfuzz aflplusplus libfuzzer afl honggfuzz libafl centipede --benchmarks suricata_mime_8553d567
@Alan32Liu I just pushed a new experiment/benchmark to find another bug
Experiment launched. Let's see how it goes this time!
Experiment 2023-09-28-quadfuzz-bug
data and results will be available later at:
The experiment data.
The experiment report.
So, no bug is found here. Did I just get lucky locally after 12 hours ?
For reference, I put it for Suricata https://github.com/google/oss-fuzz/pull/11034
So, oss-fuzz managed to find the bug with quadfuzz even if fuzzbench did not cf https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=63344&q=label%3AProj-suricata&can=1&sort=-id and https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=63340&q=label%3AProj-suricata&can=1&sort=-id
And oss-fuzz found more bugs that are yet to be fixed in Suricata (like https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=63338&q=label%3AProj-suricata&can=1&sort=-id )
What should be done with this PR @Alan32Liu ?
Thanks @catenacyber. I wonder why this bug cannot be reproduced in FuzzBench.
This is tricky because if we cannot prove quadfuzz performs better than existing fuzzers on most benchmarks, it's very hard to justify the necessity of integrating it into OSS-Fuzz, especially given the engineering work of integrating a new fuzzer. Ensuring a fuzzer is compatible with all existing projects and maintaining it will also take a lot of effort, speaking from my experience.
Thanks @catenacyber. I wonder why this bug cannot be reproduced in FuzzBench.
Maybe because oss fuzz runs longer ?
This is tricky because if we cannot prove quadfuzz performs better than existing fuzzers on most benchmarks, it's very hard to justify the necessity of integrating it into OSS-Fuzz, especially given the engineering work of integrating a new fuzzer. Ensuring a fuzzer is compatible with all existing projects and maintaining it will also take a lot of effort, speaking from my experience.
Quadfuzz is not meant to perform better on most benchmarks. It did find bugs that were not found before on ossfuzz, at least for Suricata. I do not think it is compatible with all projects, and I do not aim for it. I rather see this as an opt in for the projects who are interested in quadratic complexity...
So, I am happy that Suricata gets bug reports from ossfuzz with quadfuzz. If it does not look worth for other projects, I find it a bit sad but I am ok with it. Should I close this or do you propose something else ?
Quadfuzz is not meant to perform better on most benchmarks. It did find bugs that were not found before on ossfuzz, at least for Suricata. I do not think it is compatible with all projects, and I do not aim for it. I rather see this as an opt in for the projects who are interested in quadratic complexity...
Yep, I remember you kindly explained this earlier. Unfortunately, supporting Quadfuzz
as an opt-in for some projects does not reduce the engineering effort in integrating into OSS-Fuzz, because of the current code design.
So, I am happy that Suricata gets bug reports from ossfuzz with quadfuzz. If it does not look worth for other projects, I find it a bit sad but I am ok with it. Should I close this or do you propose something else ?
Yeah, it is a bit sad that OSS-Fuzz cannot accept Quadfuzz
at the moment due to the current code design. But I am glad to see it can find bugs on Suricata, I am sure the project maintainers appreciate that too.
It has been very nice working with you. Feel free to leave this PR open if you'd like to request more experiments with Quadfuzz
; Otherwise, we can close this for now, and when the code design of OSS-Fuzz becomes more flexible for integrating or experimenting with new fuzzers, we can reconsider this : )
So, closing this.
Are there some projects where you would like me to try quadfuzz manually ?
Fuzzing engine dedicated to find quadratic complexity
@oliverchang , waiting for NallocFuzz answer, here is a try of some fuzzing engine, which will be bad in benchmarks, but is aimed at finding quadratic complexity as inspired by @kevinbackhouse cf https://github.com/github/cmark-gfm/blob/c32ef78bae851cb83b7ad52d0fbff880acdcd44a/fuzz/fuzz_quadratic.c