google / fuzzbench

FuzzBench - Fuzzer benchmarking as a service.
https://google.github.io/fuzzbench/
Apache License 2.0
1.11k stars 270 forks source link

Adds quadfuzz engine #1854

Closed catenacyber closed 1 year ago

catenacyber commented 1 year ago

Fuzzing engine dedicated to find quadratic complexity

@oliverchang , waiting for NallocFuzz answer, here is a try of some fuzzing engine, which will be bad in benchmarks, but is aimed at finding quadratic complexity as inspired by @kevinbackhouse cf https://github.com/github/cmark-gfm/blob/c32ef78bae851cb83b7ad52d0fbff880acdcd44a/fuzz/fuzz_quadratic.c

DonggeLiu commented 1 year ago

/gcbrun run_experiment.py -a --experiment-config /opt/fuzzbench/service/experiment-config.yaml --experiment-name 2023-06-15-quadfuzz --fuzzers quadfuzz

DonggeLiu commented 1 year ago

The experiment has been launched successfully and its data & report should be available shortly: The experiment data. The experiment report.

catenacyber commented 1 year ago

Thanks @Alan32Liu What should I make out of this report ? Were new timeouts/bugs found ?

DonggeLiu commented 1 year ago

Thanks @Alan32Liu What should I make out of this report ? Were new timeouts/bugs found ?

This is a code coverage report. Once the experiment finishes, the past code coverage results of base fuzzers (libfuzzer, etc.) will be merged into the report to compare with quandfuzz.

Bug discovery requires a separate experiment, I can launch that now.

DonggeLiu commented 1 year ago

/gcbrun run_experiment.py -a --experiment-config /opt/fuzzbench/service/experiment-config.yaml --experiment-name 2023-06-15-quadfuzz-bug --fuzzers quadfuzz --benchmarks bloaty_fuzz_target_52948c harfbuzz_hb-shape-fuzzer_17863b libxml2_xml_e85b9b mbedtls_fuzz_dtlsclient_7c6b0e php_php-fuzz-parser_0dbedb

catenacyber commented 1 year ago

Should I do anything here @Alan32Liu ?

DonggeLiu commented 1 year ago

For bug-based experiment (2023-06-15-quadfuzz-bug): The experiment data. The experiment report.

DonggeLiu commented 1 year ago

Should I do anything here @Alan32Liu ?

FuzzBench generates reports to help researchers compare the performance of their fuzzer against others. FuzzBench also provides the experiment data to help in-depth analysis on the parts the fuzzer did not do well so that researchers can improve it accordingly, or justify which part the fuzzer did perform better.

In this case, the two experiments here evaluate your fuzzer quadfuzz against some common fuzzers (e.g., afl++ used in OSS-Fuzz, libafl submitted by researchers) on some benchmark programs from OSS-Fuzz.

Some quick observation:

  1. We can see that both reports show that quadfuzz did not outperform the afl++ (currently used in OSS-Fuzz production). Coverage reports shows afl++ can cover more lines and bug report shows it can find more bugs.
  2. I also checked unique branch coverage on some benchmarks, which says quadfuzz did not cover many branches that afl++ missed, but afl++ did cover many branches that others did not execute.

If you are interested, feel free to do more analysis on the performance of quadfuzz with the experiment data and see if you can make it perform better than afl++ or find some interesting/important branches that all others missed.

DonggeLiu commented 1 year ago

Also feel free to ping me if you'd like to request more experiments after improving quadfuzz : )

If you think the fuzzer will perform better than existing fuzzers on certain types of benchmarks, we can also integrate those benchmarks in this PR and run experiments to prove that. Benchmark integration is pretty straightforward: the ones in benchmark/ are basically from OSS-Fuzz projects/.

catenacyber commented 1 year ago

We can see that both reports show that quadfuzz did not outperform the afl++

This is not the goal of quadfuzz to get good coverage.

Its goal is to use the corpus found by other fuzzing engines such as afl++ et al to find quadratic complexity with big inputs that afl++ et al would not find (because it focuses on speed and small inputs, which makes perfect sense)

For example, it found the bug/slowness fixed here : https://github.com/OISF/suricata/commit/d40dca5e55286c57e9a83018975022c4f08bf6d1

So, it is a niche fuzzer, not a general-purpose one... Does this make sense ?

DonggeLiu commented 1 year ago

We can see that both reports show that quadfuzz did not outperform the afl++

This is not the goal of quadfuzz to get good coverage.

Its goal is to use the corpus found by other fuzzing engines such as afl++ et al to find quadratic complexity with big inputs that afl++ et al would not find (because it focuses on speed and small inputs, which makes perfect sense)

For example, it found the bug/slowness fixed here : OISF/suricata@d40dca5

So, it is a niche fuzzer, not a general-purpose one... Does this make sense ?

I see. Is there any chance that we can prove the benefit of quadfuzz on more benchmarks? For example, collect the corpus of afl++ on some projects, then start another experiment using the corpus as seeds for quadfuzz, afl++, and other fuzzers. We can compare the coverage/bug-finding results of these fuzzers?

catenacyber commented 1 year ago

Is there any chance that we can prove the benefit of quadfuzz on more benchmarks? For example, collect the corpus of afl++ on some projects, then start another experiment using the corpus as seeds for quadfuzz, afl++, and other fuzzers. We can compare the coverage/bug-finding results of these fuzzers?

Could we add a benchmark with the suricata fuzz target fuzz_mimedecparseline at suricata commit c56fa0a80564174df8afe172ff722ef48754b405 (just before f80c999db320aa60570b4e04846bd7beeed96cd6) ? There can also be fuzz_applayerparserparse_dcerpc at commit 668501c225d09ce1c5316c0061ff6a7e1980c64c just before d40dca5e55286c57e9a83018975022c4f08bf6d1

By the way, could suricata fuzz target fuzz_predefpcap_aware be part of fuzz bench ? (and could we know if more bugs are found) It has a huge corpus/coverage (problematic as we end up oom because corpus/coverage occupies more than 2Gb)

DonggeLiu commented 1 year ago

Is there any chance that we can prove the benefit of quadfuzz on more benchmarks? For example, collect the corpus of afl++ on some projects, then start another experiment using the corpus as seeds for quadfuzz, afl++, and other fuzzers. We can compare the coverage/bug-finding results of these fuzzers?

Could we add a benchmark with the suricata fuzz target fuzz_mimedecparseline at suricata commit c56fa0a80564174df8afe172ff722ef48754b405 (just before f80c999db320aa60570b4e04846bd7beeed96cd6) ? There can also be fuzz_applayerparserparse_dcerpc at commit 668501c225d09ce1c5316c0061ff6a7e1980c64c just before d40dca5e55286c57e9a83018975022c4f08bf6d1

We can certainly add them and run another experiment for quadfuzz in this PR. Unfortunately, I have too many things on my plate at this moment and might not be able to do that now. I can add them when I have free time, but please feel free to add them when you see fit.

By the way, could suricata fuzz target fuzz_predefpcap_aware be part of fuzz bench ? (and could we know if more bugs are found) It has a huge corpus/coverage (problematic as we end up oom because corpus/coverage occupies more than 2Gb)

Similarly, we don't have to add suricata to FuzzBench run experiments. We can add it to this PR (or another branch from this PR) to test it out first.

catenacyber commented 1 year ago

Did I get right how to add a benchmark ?

DonggeLiu commented 1 year ago

Did I get right how to add a benchmark ?

Ah yes, sorry that I just saw your commit. I think you did it correctly, let's run an experiment with it to find out.

DonggeLiu commented 1 year ago

/gcbrun run_experiment.py -a --experiment-config /opt/fuzzbench/service/experiment-config.yaml --experiment-name 2023-07-04-quadfuzz-bug --fuzzers quadfuzz aflplusplus libfuzzer afl honggfuzz libafl centipede --benchmarks suricata_mime_c56fa0a8

DonggeLiu commented 1 year ago

Since this is a new benchmark, I also added some top-performing fuzzers to compare against quadfuzz. Experiment data and results will be available at: The experiment data. The experiment report.

catenacyber commented 1 year ago

Oh, no fuzzer found the bug...

I guess I need to dig more

DonggeLiu commented 1 year ago

Oh, no fuzzer found the bug...

I guess I need to dig more

Feel free to request more experiments if that can assist in digging. FuzzBench is designed to help fuzzer evaluations before using them in production : )

catenacyber commented 1 year ago

@Alan32Liu there was a typo in the commit hash for the experiment

I pushed a fixup. Could you run the experiment again ?

Thanks

DonggeLiu commented 1 year ago

/gcbrun run_experiment.py -a --experiment-config /opt/fuzzbench/service/experiment-config.yaml --experiment-name 2023-07-13-quadfuzz-bug-1 --fuzzers quadfuzz aflplusplus libfuzzer afl honggfuzz libafl centipede --benchmarks suricata_mime_c56fa0a8

DonggeLiu commented 1 year ago

Experiment data and results will be available at: The experiment data. The experiment report.

catenacyber commented 1 year ago

I managed to get the bug once locally with quadfuzz by

DonggeLiu commented 1 year ago
  • let libFuzzer explore around it and expand the corpus

If we need to run libFuzzer to expand the corpus, would it make sense to have the expanded corpus as seed, so that all fuzzers share the same starting point?

catenacyber commented 1 year ago

If we need to run libFuzzer to expand the corpus, would it make sense to have the expanded corpus as seed, so that all fuzzers share the same starting point?

That makes sense.

But I realized my corpus input was wrong because of \n not being interpreted...

Could you run the experiment with this ?

DonggeLiu commented 1 year ago

/gcbrun run_experiment.py -a --experiment-config /opt/fuzzbench/service/experiment-config.yaml --experiment-name 2023-07-20-quadfuzz-bug --fuzzers quadfuzz aflplusplus libfuzzer afl honggfuzz libafl centipede --benchmarks suricata_mime_c56fa0a8

DonggeLiu commented 1 year ago

Experiment data and results will be available at: The experiment data. The experiment report.

catenacyber commented 1 year ago

Looks like the build failed. Tried to fix it...

catenacyber commented 1 year ago

Hey @Alan32Liu coming back to this, could we run again a benchmark with the build that should have been fixed ?

DonggeLiu commented 1 year ago

Hey @Alan32Liu coming back to this, could we run again a benchmark with the build that should have been fixed ?

Sorry, did you mean re-running an experiment? Sure! Would the previous setting work? gcbrun run_experiment.py -a --experiment-config /opt/fuzzbench/service/experiment-config.yaml --experiment-name 2023-07-20-quadfuzz-bug --fuzzers quadfuzz aflplusplus libfuzzer afl honggfuzz libafl centipede --benchmarks suricata_mime_c56fa0a8

catenacyber commented 1 year ago

Sorry, did you mean re-running an experiment?

Yes :-)

Would the previous setting work?

It should

DonggeLiu commented 1 year ago

/gcbrun run_experiment.py -a --experiment-config /opt/fuzzbench/service/experiment-config.yaml --experiment-name 2023-09-08-quadfuzz-bug --fuzzers quadfuzz aflplusplus libfuzzer afl honggfuzz libafl centipede --benchmarks suricata_mime_c56fa0a8

DonggeLiu commented 1 year ago

Experiment 2023-09-08-quadfuzz-bug data and results will be available later at: The experiment data. The experiment report.

catenacyber commented 1 year ago

Thanks, looks like afl now finds the bug (it did not find it on oss-fuzz) but counts it 3 times... quadfuzz finds also the bug and libFuzzer does not.

Do I understand this correctly ?

DonggeLiu commented 1 year ago

Thanks, looks like afl now finds the bug (it did not find it on oss-fuzz) but counts it 3 times... quadfuzz finds also the bug and libFuzzer does not.

Do I understand this correctly ?

Sorry that I completely missed this message. You are correct that AFL found the bug (counting 3 times is likely due to our grouping algorithm categorizing the different bug-triggering stack traces into 3 groups). QuadFuzz indeed found the bug, likely with very similar stack traces.

LibFuzzer instances also found the bug after 13 hours, though.

catenacyber commented 1 year ago

LibFuzzer instances also found the bug after 13 hours, though.

Oh... How come it did not find the bug in 18 monts on oss-fuzz ? Did it get improved meanwhile ?

DonggeLiu commented 1 year ago

Oh... How come it did not find the bug in 18 monts on oss-fuzz ? Did it get improved meanwhile ?

Not sure if it has improved, but it might be due to changes we made in this PR (e.g., the seed corpus?)

catenacyber commented 1 year ago

Oh... How come it did not find the bug in 18 monts on oss-fuzz ? Did it get improved meanwhile ?

Not sure if it has improved, but it might be due to changes we made in this PR (e.g., the seed corpus?)

The seed corpus added was part of the public corpus. The bug was found historically when a bounds check was added, when there was none previously, and this bounds check was not correct in every case...

DonggeLiu commented 1 year ago

The seed corpus added was part of the public corpus. The bug was found historically when a bounds check was added, when there was none previously, and this bounds check was not correct in every case...

Interesting. I just checked that libFuzzer was updated recently. Could this be the reason?

catenacyber commented 1 year ago

If oss-fuzz did not find the bug 2 years ago or so, and now libFuzzer finds it in 13 hours, I find it likely that libFuzzer was improved meanwhile.

Do you have ideas of good projects where I could manually run quadfuzz to see if it finds anything new ? (besides Suricata)

catenacyber commented 1 year ago

@Alan32Liu I just pushed a new experiment/benchmark to find another bug

DonggeLiu commented 1 year ago

/gcbrun run_experiment.py -a --experiment-config /opt/fuzzbench/service/experiment-config.yaml --experiment-name 2023-09-28-quadfuzz-bug --fuzzers quadfuzz aflplusplus libfuzzer afl honggfuzz libafl centipede --benchmarks suricata_mime_8553d567

DonggeLiu commented 1 year ago

@Alan32Liu I just pushed a new experiment/benchmark to find another bug

Experiment launched. Let's see how it goes this time! Experiment 2023-09-28-quadfuzz-bug data and results will be available later at: The experiment data. The experiment report.

catenacyber commented 1 year ago

So, no bug is found here. Did I just get lucky locally after 12 hours ?

catenacyber commented 1 year ago

For reference, I put it for Suricata https://github.com/google/oss-fuzz/pull/11034

catenacyber commented 1 year ago

So, oss-fuzz managed to find the bug with quadfuzz even if fuzzbench did not cf https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=63344&q=label%3AProj-suricata&can=1&sort=-id and https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=63340&q=label%3AProj-suricata&can=1&sort=-id

And oss-fuzz found more bugs that are yet to be fixed in Suricata (like https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=63338&q=label%3AProj-suricata&can=1&sort=-id )

What should be done with this PR @Alan32Liu ?

DonggeLiu commented 1 year ago

Thanks @catenacyber. I wonder why this bug cannot be reproduced in FuzzBench.

This is tricky because if we cannot prove quadfuzz performs better than existing fuzzers on most benchmarks, it's very hard to justify the necessity of integrating it into OSS-Fuzz, especially given the engineering work of integrating a new fuzzer. Ensuring a fuzzer is compatible with all existing projects and maintaining it will also take a lot of effort, speaking from my experience.

catenacyber commented 1 year ago

Thanks @catenacyber. I wonder why this bug cannot be reproduced in FuzzBench.

Maybe because oss fuzz runs longer ?

This is tricky because if we cannot prove quadfuzz performs better than existing fuzzers on most benchmarks, it's very hard to justify the necessity of integrating it into OSS-Fuzz, especially given the engineering work of integrating a new fuzzer. Ensuring a fuzzer is compatible with all existing projects and maintaining it will also take a lot of effort, speaking from my experience.

Quadfuzz is not meant to perform better on most benchmarks. It did find bugs that were not found before on ossfuzz, at least for Suricata. I do not think it is compatible with all projects, and I do not aim for it. I rather see this as an opt in for the projects who are interested in quadratic complexity...

So, I am happy that Suricata gets bug reports from ossfuzz with quadfuzz. If it does not look worth for other projects, I find it a bit sad but I am ok with it. Should I close this or do you propose something else ?

DonggeLiu commented 1 year ago

Quadfuzz is not meant to perform better on most benchmarks. It did find bugs that were not found before on ossfuzz, at least for Suricata. I do not think it is compatible with all projects, and I do not aim for it. I rather see this as an opt in for the projects who are interested in quadratic complexity...

Yep, I remember you kindly explained this earlier. Unfortunately, supporting Quadfuzz as an opt-in for some projects does not reduce the engineering effort in integrating into OSS-Fuzz, because of the current code design.

So, I am happy that Suricata gets bug reports from ossfuzz with quadfuzz. If it does not look worth for other projects, I find it a bit sad but I am ok with it. Should I close this or do you propose something else ?

Yeah, it is a bit sad that OSS-Fuzz cannot accept Quadfuzz at the moment due to the current code design. But I am glad to see it can find bugs on Suricata, I am sure the project maintainers appreciate that too.

It has been very nice working with you. Feel free to leave this PR open if you'd like to request more experiments with Quadfuzz; Otherwise, we can close this for now, and when the code design of OSS-Fuzz becomes more flexible for integrating or experimenting with new fuzzers, we can reconsider this : )

catenacyber commented 1 year ago

So, closing this.

Are there some projects where you would like me to try quadfuzz manually ?