rust-fuzz / libfuzzer

Rust bindings and utilities for LLVM’s libFuzzer
Apache License 2.0
208 stars 44 forks source link

Check min size needed with `Arbitrary::size_hint` and early exit if the data isn't long enough #59

Closed fitzgen closed 4 years ago

fitzgen commented 4 years ago

@bnjbvr was reporting that starting fuzzing from scratch with a fuzz target that takes an Arbtirary impl was spending a lot of time on three bytes long inputs, where the Arbitrary implementation required more input bytes than given. The fuzzer wasn't giving more input bytes fast enough to get to the meaty bits of the fuzz target.

In theory, we could use Arbitrary::size_hint to create a bunch of random seeds for the corpus of the appropriate size and/or control the maximum length of inputs that libfuzzer will generate.

I'm not sure exactly what this would look like, but it probably requires another env var dance between cargo fuzz and libfuzzer-sys like how the debug printing works.

As far as how this is exposed to users, maybe we should add cargo fuzz seed <target> as a new subcommand?

Or maybe we can just document that users can do something like head -c 100 /dev/random > fuzz/corpus/my-target/my-random-seed to add a random 100 bytes seed to their fuzz target's corpus to get things off the ground...

Manishearth commented 4 years ago

it feels like libfuzzer should have an option for this? if not perhaps we should ask for one?

fitzgen commented 4 years ago

What are you imagining that libfuzzer would do? It doesn't know whether it provided enough bytes for Arbitrary or not. I'm not sure what you mean here.

rohanpadhye commented 4 years ago

Hello! I'm new to cargo-fuzz but I've encountered similar issues when implementing the Zest algorithm in JQF (a structured-input fuzzer for Java). The solution for this problem was to extend the input on-demand when the Quickcheck generators (analogous to Arbitrary) request more bytes, and then cap this extension to some max limit. The relevant source code is here: https://github.com/rohanpadhye/jqf/blob/aeff24a79e409f6ab2c1b8a5c987dbafb091461d/fuzz/src/main/java/edu/berkeley/cs/jqf/fuzz/ei/ZestGuidance.java#L1068-L1099

I don't know much about libfuzzer internals, but it would be good if a test driver could actually modify and extend the byte array that libfuzzer provides during test execution, so that cargo-fuzz (or libfuzzer-sys) could just populate any extra bytes with freshly generated random values on demand. That would be such a useful feature for libfuzzer to have in many different scenarios other than powering Arbitrary...

Manishearth commented 4 years ago

@fitzgen ./fuzz --min-len=foo, and a way to pass that down. This would require executing the fuzz target with another env var to get the size hint.

Manishearth commented 4 years ago

@rohanpadhye Worth proposing to libFuzzer. I suspect that given the general fuzzer focus on fuzzer-driven mutation this may not actually work that well, but worth a shot either way.

fitzgen commented 4 years ago

Thanks for the insights @rohanpadhye!

The solution for this problem was to extend the input on-demand when the Quickcheck generators (analogous to Arbitrary) request more bytes, and then cap this extension to some max limit. The relevant source code is here: https://github.com/rohanpadhye/jqf/blob/aeff24a79e409f6ab2c1b8a5c987dbafb091461d/fuzz/src/main/java/edu/berkeley/cs/jqf/fuzz/ei/ZestGuidance.java#L1068-L1099

@Manishearth, this reminds me of how during the redesign of Unstructured, we debated whether to error early when running out of bytes or to artificially extend it with zero bytes. We thought that exiting early would avoid confusing the fuzzer, and keep it in control of exploring the input space. Maybe this was the wrong choice?

We could fairly easily start returning zeros after we've exhausted the input (and have a max limit on how many zeros we return just in case something gets in a loop).

But if we were to actually pursue this, first I would want to take a real world fuzz target (or multiple targets) and test how much code coverage we get after X period of time with and without the zero extending, when starting from an empty corpus. Unfortunately, I don't have the cycles to run the experiment...

Anyways, I'll file an issue for a -min_len flag for libfuzzer.

fitzgen commented 4 years ago

Oh another idea is we could do something like this:

if data.len() < Arbitrary::<FuzzTargetType>::size_hint().0 {
    return;
}

...

in the libfuzzer_sys::fuzz_target! macro. That would at least cut down on the reported code covered inside Arbitrary impls and also be a bit faster.

fitzgen commented 4 years ago

In fact, I think that would be the best way to handle this.