rust-fuzz / cargo-fuzz

Command line helpers for fuzzing
https://rust-fuzz.github.io/book/cargo-fuzz.html
Apache License 2.0
1.49k stars 108 forks source link

Figure out a solution for the stdlib #5

Open Manishearth opened 7 years ago

Manishearth commented 7 years ago

Libstd isn't built with sanitizer support. Perhaps we should release a version that is.

Manishearth commented 7 years ago

https://github.com/japaric/rust-san

nagisa commented 7 years ago

It is likely that instrumenting libstd itself is counterproductive. Instrumenting libstd and all dependent libraries will increase the workload of sanitizer greatly and may take longer than necessary to yield useful results.

Instead we should instrument the target crate only by default and provide options to increase fuzzing scope (instrumenting dependencies, instrumenting libstd, etc.)

Manishearth commented 7 years ago

idk, lots of panics originate from the stdlib (indexing, etc). While it's true that the branches from these usually get inlined (even though the panic code doesn't), which should be enough for fuzzing, it would be nice to have an instrumented stdlib.

rust-san has instructions for setting up an instrumented stdlib with xargo, we could probably just provide links to that and perhaps add a cargo fuzz command for setting that up. You're right that this doesn't have to be the default.

nagisa commented 7 years ago

Some clarification to my comment above:

libstd does not need to be instrumented for panics to be understood by the fuzzer. With that out of the way…

By only instrumenting the crate of interest, what is essentially done is search for bugs within the crate of interest alone (± some inlined code). By instrumenting the dependencies (incl. libstd) as well, fuzzer now effectively searches for bugs within the dependencies at the same time.

This may sound desirable, but it is not. The dependencies can be fuzzed on their own. Fuzzing the crates separately both reduces the scope of fuzzing greatly (i.e. fuzzing is faster to find bugs, and if no bugs are found, there’s more confidence about crate being correct for the same investment of time), but also verifies the correctness of the dependency crate independently, which is strictly better than checking correctness of dependency within context of your own crate.

So it seems to me that fuzzing crates separately and in isolation is ideal. I recognise that not everybody wants to spend effort to fuzz their dependencies and that fuzzing everything at once is less effort – that’s where the options to increase the scope come in.

Manishearth commented 7 years ago

libstd does not need to be instrumented for panics to be understood by the fuzzer. With that out of the way…

Well, yeah, but the branches in libstd will be opaque to the fuzzer, no?

Interesting point about fuzzing dependencies. We're currently using RUSTFLAGS but an option to use cargo rustc would be nice.

emk commented 7 years ago

Well, yeah, but the branches in libstd will be opaque to the fuzzer, no?

I suspect this is why I can't usefully fuzz the vobsub crate, which does lots of tricky, low-level parsing. The very first thing I do with the input bytes is search for an MPEG-2 Program Stream header:

            // Search for the start of a ProgramStream packet.
            let needle = &[0x00, 0x00, 0x01, 0xba];
            let start = self.remaining.windows(needle.len())
                .position(|window| needle == window);

When I fuzz this, cargo fuzz appears to be endlessly fuzzing single-byte inputs:

INFO: Seed: 577350870
INFO: Loaded 0 modules (0 guards): 
Loading corpus dir: corpus
INFO: -max_len is not provided, using 64
INFO: A corpus is not provided, starting from an empty corpus
#0  READ units: 1
#1  INITED cov: 63 corp: 1/1b exec/s: 0 rss: 70Mb
#1048576    pulse  cov: 63 corp: 1/1b exec/s: 524288 rss: 136Mb
#2097152    pulse  cov: 63 corp: 1/1b exec/s: 419430 rss: 194Mb
#4194304    pulse  cov: 63 corp: 1/1b exec/s: 419430 rss: 311Mb
#8388608    pulse  cov: 63 corp: 1/1b exec/s: 399457 rss: 544Mb
#16777216   pulse  cov: 63 corp: 1/1b exec/s: 409200 rss: 746Mb
#33554432   pulse  cov: 63 corp: 1/1b exec/s: 419430 rss: 747Mb
#67108864   pulse  cov: 63 corp: 1/1b exec/s: 422068 rss: 747Mb

I'm going to try reading the "san" docs to see if I can rebuild std, and see if that helps any.

emk commented 7 years ago

OK, there's some useful information in the Reddit thread announcing cargo fuzz. This links to Better backtraces, which explains how to rebuild rust-src using xargo. But I'm not quite sure how to make this work under cargo fuzz without modifying cargo fuzz itself. I'll try poking at this later if I have time.

I'd really like to fuzz vobsub; it's parsing all sorts of ancient, ugly MPEG-2 data, and I would love to provide it as part of a web service someday.

nagisa commented 7 years ago

@emk consider providing at least one starter file to the fuzzer (put into fuzz/corpus).


Well, yeah, but the branches in libstd will be opaque to the fuzzer, no?

That’s only for functions monomorphised within libstd. They are few and far between.

emk commented 7 years ago

@nagisa Thank you for your advice! It turned out I had two problems:

  1. The MPEG2 packet formats are complex enough that I really did need to start with a valid input to get good results.
  2. My driver was accidentally ignoring certain important code paths.

Now I've found 5 issues and I'm looking for more (I'll submit a PR for the trophy case shortly). These kinds of low-level binary decoders are excellent hunting grounds for bugs, even in safe mode.

I've provided documentation on how to fuzz-test vobsub here, in case anybody wants to experiment with it. Thank you for your help and for a great tool!

whitequark commented 6 years ago

Triage: documentation issue, excerpt from above:

rust-san has instructions for setting up an instrumented stdlib with xargo, we could probably just provide links to that and perhaps add a cargo fuzz command for setting that up. You're right that this doesn't have to be the default.

elichai commented 4 years ago

I think this is kinda solved in nightly? https://doc.rust-lang.org/nightly/cargo/reference/unstable.html#build-std

Manishearth commented 4 years ago

Yeah, you can also use xargo on stable. I'd accept PRs adding support for a build-std mode

Shnatsel commented 4 years ago

FWIW this is strictly required for Memory Sanitizer to work. Right now enabling msan in cargo-fuzz makes any binary immediately abort with a MSAN error. This is documented in more detail here: https://doc.rust-lang.org/nightly/unstable-book/compiler-flags/sanitizer.html#memorysanitizer

A recipe for MSAN build that also enables MSAN for C dependencies that are linked into the binary can be found here: https://github.com/rust-lang/rust/issues/39610#issuecomment-573391197

Shnatsel commented 4 years ago

This is now as simple as cargo fuzz run target_name -Z build-std

Also, #233 automatically enables this option if Memory Sanitizer is enabled.

saethlin commented 2 years ago

Should fuzz targets always use -Zbuild-std? The standard library has a lot of debug assertions and I'm sure there are Rust-level UB that the sanitizers won't understand but which could be caught by having debug_assertions turned on in the standard library.