Open dylanjwolff opened 2 years ago
The key idea is essentially: Instead of saying fuzzer A is the top fuzzer in general, we could say that Fuzzer A is the top fuzzer under these circumstances while fuzzer B is the top fuzzer under those other circumstances. For any given benchmark run, a user could essentially use a slider on those benchmark properties to see how the fuzzer ranking changes.
Sorry for the delay, I've had a bit of a crazy schedule with my holidays. I personally think the second might be more interesting and seems less of a maintenance burden (the analysis just gets done at the end right?) But I'm interested in seeing both.
The UI/UX question is tricky, I don't have any answers let me think about it more. I'm happy to see your samples as well.
[Properties] Which properties would you consider to be interesting? We currently have
- seed-corpus: initial coverage, number of seeds, average seed exec time, average seed size
- program: size (and others). Anything else that you would like to look at?
Would it make sense to compare different performances after:
Also, for fuzzers that can take an input keywords dictionary, maybe we could sample the items in the dictionary in the same way as sampling the initial corpus?
Would it make sense to compare different performances after:
- tuning the hyper-parameters assumed by the fuzzers (e.g., maximum input length), or
- changing the default heuristic used by the fuzzers (e.g., libFuzzer can try to generate small inputs first)?
Also, for fuzzers that can take an input keywords dictionary, maybe we could sample the items in the dictionary in the same way as sampling the initial corpus?
Absolutely! However, this might be more difficult to implement. You'll need to expose some API that the fuzzer developer can use to specify what to vary during the benchmarking.
I personally think the second might be more interesting and seems less of a maintenance burden (the analysis just gets done at the end right?) But I'm interested in seeing both.
Yup, the analysis portion is just some post-processing that can be run on something similar to the final report data CSV file. But, without corpus sampling, you could only look at the effects of program properties as the corpus would be constant across trials.
Adding on to @mboehme's and @Alan32Liu's comments about fuzzing parameters: it a very interesting idea, but I agree the implementation (and maintenance) effort needed is probably quite high to get many different fuzzers to present a similar interface for various parameters. Dictionaries would be more doable, as that is at least already a consistent "interface" across fuzzers.
Please feel free to let me know if there is anything that I could help with : )
TO @jonathanmetzman @lszekeres CC @mboehme @inferno-chromium
We have two related features which we've implemented on a private fork that we'd like to integrate into Fuzzbench. The first is the ability to sample from a larger pool of seeds to provide a unique corpus to each fuzzer per trial during a benchmarking run. The second consists of additional data-analysis to give some insight into how various aspects of the initial corpora and programs under test might be affecting benchmarking outcomes.
The purpose of this issue it to establish the following:
Thanks!