rust-lang / rustc-dev-guide

A guide to how rustc works and how to contribute to it.
https://rustc-dev-guide.rust-lang.org
Apache License 2.0
1.65k stars 507 forks source link

Guide for fuzzer-users #1645

Closed workingjubilee closed 1 year ago

workingjubilee commented 1 year ago

There's quite a lot of documentation on how to build, run, and debug the compiler. That's good! However, there's a notable subsection of compiler contributors that needs special direction:

People who bring in a fuzzer and make rustc crab rave until the next ICE Age.

And as Rust's profile increases, more of them show up, as more and more people use the Rust compiler in their research projects, and fuzzing happens to have a lot of interesting computer science research attached to it. And they open issues. Lots of issues. Our own @matthiaskrgr has opened over 500 issues. And over 400 of them have been fixed!

Almost-inevitably the first few issues include at least one "total whiff": it's code that produces an ICE, sure. However, it was generated with something like a specific compilation flag combined with internally unstable features used to implement the core library, to produce some ICE that we probably will not bother to fix, despite the local enthusiasm for dropping ICE on GitHub. Usually after a bit of feedback they dial in on issues that are a bit "higher value", but having to have that back-and-forth with each person feels like it's going to involve a lot of repetition. If they're mostly-automating bug-finding, it seems reasonable to automate (or at least cache in documentation) some of our responses. This documentation thus may cover more "how to file an issue that is useful" but still.

In particular, for such documentation, we need to capture the differences in how best to go about. "fuzzing the compiler" and "fuzzing the nightly compiler" and "fuzzing the internal features of the compiler". We don't want people to never fuzz things that require internal features, because something like "the MIR validator breaks on trying to validate some unusual MIR" is really useful to find out, even though that's an internal feature. But if the code requires misusing magic attributes that change layout or use an internal intrinsic outside its parameters, we probably will shrug that off. Likewise there's "fuzzing a nightly feature" vs. "fuzzing a nightly feature that had its prototype skeleton impl with 100 todo!()s landed yesternight".

Pinging Matthias because perhaps he may have some thoughts on what he would say to his clone and/or to himself after traveling back in time.

matthiaskrgr commented 1 year ago

Well, there are different approaches to this.

What we have (no guarantee of completeness or correctness :see_no_evil: )

There is https://github.com/dwrensha/fuzz-rustc by @dwrensha which builds rustc with instrumentation and libfuzzer and then performs coverage based fuzzing (this is great for finding bugs in the frontend where the initial parsing of code happens.

@jruderman built on top of that and added a rustc specific mutator https://github.com/jruderman/fuzz-rustc that is smarter than just coverage based fuzzing and I also think his version is able to avoid "noise"-ices a bit better by passing in custom compiler flags.

There is https://github.com/matthiaskrgr/icemaker/ which I can talk most about because I wrote it myself. This is by itself not a fuzzer, but rather a big parallel

for tool  in tools {
 for flags in rustcflags {
   for file in files {
       exec(tool, flags, file).filter_ice()
     }
   }
}

executor. The downside is that I need some kind of initial code to work with. The upside is that I do not rely on instrumented code, so I can also run clippy, miri, rustdoc, rustfmt, kani, or cg_clif etc and filter out the crashes and interesting outputs, UB found by miri, applyable diagnostics that actually break code etc... About 70% of the crashes I found are probably from code inside the rustc repo (I have checked out every modification of every file contained in the rustc repo side by side which totals to around 15 gb of code iirc) which is a great dataset because most code that was once used as a test probably still shouldn't crash rustc. I can also do things like "let's try to build the entire rustc testsuite with -Zmir-opt-level=3 or the gat-feature before it gets stabilized" to avoid having to track down and minimize bugs when they appear in "real world" code (through crater or whatever). I usually try to run this on a daily basis on a up to date rustc repo and file any new bugs that show up during this. (I had this automatized in CI so I could just check the log-diff each morning but something broke there and my new laptop is faster than github actions ci right now anyway...)

This week I found out about https://github.com/langston-barrett/tree-splicer by @langston-barrett which as far as I understand is a more advanced code generator and it reads and parses rustc files as input and can then splice together new files from it. I imagine it a bit like reading a rust file, generating a syntax tree, and then attaching and/or replacing random nodes with/to other nodes to generate new code. (Langston can probably go into more detail here :smile: ) I am currently using this to generate input code that is then checked by icemaker.

Common pitfalls When using in input data set to generate new code, you most likely don't want to use code that you know already crashes the compiler (to avoid duplicates and ice-noise). I think the instrument-fuzzer based tools used to first compile all their inputs files and only select the files that compiled without crash, or even without syntax error for further processing.

When generating code, there are a couple of patterns you can filter out from the start, such as break rust ,

#![feature(rustc_attrs)]
#[rustc_error(delay_span_bug_from_inside_query)]

#[rustc_symbol_name] #![no_std] code

and custom_mir / #[custom_mir.. / mir!(... tests.

sometimes internal rustc tests are also marked as "we know this is a bug that is tracked, but we definitely want to know if we fix it by accident and looks somewhat like this:

// build-fail
// known-bug: #95134
// compile-flags: -Copt-level=0
// failure-status: 101
// dont-check-compiler-stderr

Reporting

My personal opinion is that I would rather have 2 duplicate tickets in my bugtracker than have an unknown bug out in the wild, but of course we should try to avoid duplicates. I usually try to search for the ICE message in the bugtracker, such as const parameter types cannot be generic, tcx.resolutions(()) is not supported for local crate or self.scc_universes[scc] == ty::UniverseIndex::ROOT but sometimes Option::unwrap() on a None value or index out of bounds: the len is 4 but the index is 2147483649 are not that descriptive, then you can still try to search for the rust compiler source file that the ICE occurred in, but sometimes ICE messages get renamed, code is moved, or something changes so that the same file still crashes but in a different part of the compiler so it's a bit tricky. It is also not uncommon to see a single ICE happen in like 200 different input files and wondering "is that all the same bug?"

I have been thinking a bit if we could have some kind of "ice index" so that at least each intentional ICE, such as assert!() or unimplemented!() gets an ID (similar to the rustc error index) or other unique identifier so that we can identify identical ICEs and can simply search for that in the bugtracker. Perhaps @estebank who has been working on ICE templates has some more ideas here or knows if this is doable? :)

It is usually useful to have a minimal code sample you can try to minimize using https://github.com/Nilstrieb/cargo-minimize by @Nilstrieb or https://github.com/langston-barrett/treereduce Langston but when reducing automatically, it may also happen that the reduction degenerates valid code that triggers an ICE into invalid code that triggers an ICE because all the tool cares for is ICE: yes or no?.

Then, last but not least you can try to bisect the minimal code sample using https://github.com/rust-lang/cargo-bisect-rustc to find out which commit caused this, if this is a regression introduced two days ago or if this is a stable ice that we have been carrying around for years (probably not that critical then since nobody noticed until now).

I hope y'all don't mind the pings but since all the tools here are somewhat related and aim for similar goals I thought you also might have some experiences or ideas that you wanted to share :)

Noratrieb commented 1 year ago

Thank you for all your work! But I don't think cargo-minimize is useflu for fuzzed ICEs, as its main goal is being very coarse. If you have an ICE in a 40k loc codebase, cargo-minimize is for you. But for tiny changes I wouldn't recommend it and it won't do much.

langston-barrett commented 1 year ago

Thanks for starting this discussion! As someone who has required and benefited from such guidance from maintainers, I agree that this documentation would be very useful.

Here are some points I think such docs could include:

Perhaps there should be a different issue template for fuzzed bugs? It could add a "fuzzed" label, contain a link to this documentation, and maybe include a checklist for reporters. Alternatively, the ICE bug template could ask reporters to note in the description if the bug is fuzzed, and also link to these docs.

It might be nice if rustc could emit a different message when rust_attrs, no_core, etc. are active and an ICE is found, perhaps pointing to these docs rather than unconditionally encouraging a bug report.

compiler-errors commented 1 year ago

Clarify if the rustc maintainers want people to fuzz. I didn't have to test tree-splicer on rustc. I would have been more likely to if I read "The Rust project appreciates and welcomes fuzzed bugs that conform to these guidelines", or correspondingly less likely to if I read "The Rust project doesn't actively encourage fuzzing but allows fuzzed bug reports according to these guidelines."

I think you'd find people all across this spectrum in the project, but my feeling is that ICE reports are welcome (except for those categories pointed out above, like rustc_* attrs or no_core or whatever) as long as it's clear that there was care in reporting the bugs. Things like minimization (ideally, attempting to remove spurious other errors would be the best), formatting, cursory check for duplicates, etc.

Perhaps there should be a different issue template for fuzzed bugs? It could add a "fuzzed" label, contain a link to this documentation, and maybe include a checklist for reporters.

This would be nice. I would like fuzzed issues to start out as "P-low", especially if they use nightly features or obviously sequences of tokens that are not likely encountered in the wild.

This is a bit more niche, but maybe links to guides on building rustc with PGO/BOLT. In fuzzing, executions/sec matter a lot. You may also want -C target-cpu=native.

Alternatively, you could probably do a lot of effective fuzzing with the same emit-metadata-only rustc flags that we run something like cargo check for, since the majority of ICEs are coming from the front end of the compiler.

compiler-errors commented 1 year ago

Oh! Also advising the fuzzer to bisect with cargo-bisect-rustc would also be great. That cuts out a step that usually is critical when it comes to fuzzer ICEs -- knowing what PR to blame cuts out a bunch of debugging usually.

matthiaskrgr commented 1 year ago

Didn't we have a bot for that at some point? https://github.com/bjorn3/cargo-bisect-rustc-bot

matthiaskrgr commented 1 year ago

minimization

I think someone already tried halfempty at some point but it was quite difficult to get it to run for some reason.

I've already spent hours trying to reduce crashes in entire crates somehow and it would be lovely if that worked too, somehow.

Being able to reduce code in in cases that don't crash (like codegen regressions, clippy diagnostics bugs etc) will probably be useful all around. :)

except for those categories pointed out above, like rustc_* attrs or no_core or whatever

I wonder if we could actually intercept these ices (I think these cause usually easy-to-tell-errors?) and hide the "report me" link or change the backtrace a bit to make it more obvious that this is a "we don't care" issue.

This would be nice. I would like fuzzed issues to start out as "P-low", especially if they use nightly features or obviously sequences of tokens that are not likely encountered in the wild.

I don't think this is a good idea as it causes these issues to more likely fly under the radar.

I have already seen a couple of cases (for example the forcing query with already existing DepNode ices) where we already did have a mvce from years ago and people still kept hitting it and sunk hours into reducing entire crates to come up with basically the same snippet that had already been reported. Sometimes you may also find an a bug via fuzzing and someone else reported this as "my crate crashed rustc" years ago, but that issue was closed as "cannot reproduce" or lays stale. IMO this actually makes fuzzed findings more valuable since you don't have to dig through entire crates but already have a kinda small set of code.

Alternatively, you could probably do a lot of effective fuzzing with the same emit-metadata-only rustc flags that we run something like cargo check for, since the majority of ICEs are coming from the front end of the compiler.

I think if you -o/dev/null, rustc will only run until it tries to codegen and then abort. I use this hack quite extensively :sweat_smile:

langston-barrett commented 1 year ago

@rustbot claim

workingjubilee commented 1 year ago

It might be nice if rustc could emit a different message when rust_attrs, no_core, etc. are active and an ICE is found, perhaps pointing to these docs rather than unconditionally encouraging a bug report.

We're moving forward on an internal_features lint so we'll have room to hang more diagnostics on that.

workingjubilee commented 1 year ago

I agree that "fuzzed issues are low priority" is in a broad sense false. If we find a regression on a "yeah, that looks like something a human might actually write" input through fuzzing, and that bug was introduced recently, that's a stable-to-nightly regression that we want to catch right away and we may not want to have that flagged P-low. Even if it is objectively not a big deal, "fresh" bugs are important to nail down, and often get higher priorities, even when they would have been labeled P-low if it had slipped onto stable-to-stable.