Add fuzz target `checker/fuzz/check_project_naive`

jasikpark commented 1 month ago

This PR adds a new fuzz target testing the checker repo. It's not all that helpful at the moment, since it's just detecting todo!() blocks in the code, but it can be useful in the future.

This also sets up the fuzzing_parser and fuzzing_checker CI jobs to replace the fuzzing CI job

/cc @addisoncrump

Fixes https://github.com/kaleidawave/ezno/issues/153

kaleidawave commented 1 month ago

Awesome!

Yeah doing a quick rg, there are ~200 todo! macros (some of those might be behind old commented out code though) in checking APIs. And there are 50 todo!s under the synthesis directory which should all have access to positions and checking_data so should be calling checking_data.raise_unimpement_error instead rather than crashing.

Is there a reason for using naive data, rather than structured?

Also didn't I ask about doing fuzzing for checking last year and heard it wasn't possible?

Also as the fuzzing tests don't run on windows (right?) and the current setup is exit on first error is there a way to do print failed exceptions as it would make fixing them a lot easier. In the current CI setup I have option thing SHORT_CIRCUIT with a modified command recommended by @addisoncrump . But if I remember correctly @addisoncrump a few months ago said it then doesn't run the fuzzing in parallel (and thus covers less cases)? Is there a way to run the most cases while not short-circuiting as I can't run them locally and so rely on back and forth with CI?

addisoncrump commented 1 month ago

Maybe I can shine some light on a few of these points.

Also didn't I ask about doing [mutation-based] fuzzing for checking last year and heard it wasn't possible?

It is possible, but it's not necessarily going to find bugs with complicated pre-conditions. Mutation-based fuzzing works by exploring the program; it mutates a input, and if it sees new coverage, it keeps that input, otherwise discarding. Suppose you have a bug caused by a single block; this strategy finds bugs for which a crash/incorrectness/branch is induced unconditionally (e.g. it just doesn't work) or conditionally on only the last few blocks. As more of the program's state is encoded into bugs/branches which determine whether these blocks get executed, fuzzing becomes less useful without significantly better mutations. This is because you actually want to explore lots of different program states which are less related to the coverage when your targets encode more state into each branch/action.

Concretely, parsers are very good mutation-based fuzz targets because they typically don't encode a lot of state. Typically, each block refers to one particular action (e.g. transforming some tokens to a structure) and that action doesn't have a whole lot of conditions or state attached to it (just the immediate scope). Since mutating from one input to another that is a small edit distance away (naïvely or structured) is likely to induce a bug or new coverage, we can explore the program space and the bugs therein.

By comparison, a ton of information is encoded in the state at any given moment in type checkers/compilers, and is used to determine branches and may affect whether a bug is triggered. For this reason, this fuzzer will likely not find deeper bugs in the type checker: edit distance between new coverage and bugs is not guaranteed to be small. Fuzzing compilers is possible, but generally people use generative fuzzers that produce large complex (and generally near-correct!) programs that encode a lot of state, not mutation-based fuzzers like this one. This will find bugs, but it likely won't find so-called "deep bugs".

TL;DR: if you want to find really complicated bugs, you're gonna need a far more specialized fuzzer than naïve or even structured mutation-based fuzzers.

then doesn't run the fuzzing in parallel (and thus covers less cases)? Is there a way to run the most cases while not short-circuiting as I can't run them locally and so rely on back and forth with CI?

It does run in parallel, it just stops all parallel processes at the same time. If you don't want to short-circuit, you can set a max_runs or max_total_time flag to stop it at a given time and put fork, ignore_crashes, ignore_timeouts, ignore_ooms; cargo-fuzz will dump all the crashes after, but they won't be deduplicated by cause (because this is sadly infeasible) and you would need to re-run them to see their stacktraces.

jasikpark commented 1 month ago

And there are 50 todo!s under the synthesis directory which should all have access to positions and checking_data so should be calling checking_data.raise_unimpement_error instead rather than crashing.

Does this mean that it is in fact finding an issue then? In that I would be able to catch those as ignorable errors for the time being, but right now they panic when they shouldn't?

kaleidawave commented 1 month ago

Yep the unimplement method raises a standard warning (which is better than the program crashing). I don't think I added it in all the places though. Should fix that!

Thanks @addisoncrump! Will act on them later

kaleidawave commented 3 weeks ago

Hey, sorry have been busy so haven't had a chance to look at it.

Will merge but only after #157 has been merged which clears up obvious todo!s

Also looking at CI times, is the fuzzing build using the cache? It might be just because it is a release build and more dependencies or something but it is taking 90 seconds to build whereas other tasks take 30-45 seconds? Maybe because it is not in the workspace or something (because we don't want it to always be built in development). Not sure?

jasikpark commented 3 weeks ago

For fuzzing, the fuzz crates are their own workspace, since there's some weird linking and LLVM flags set. So I don't think it caches?

kaleidawave / ezno

Add fuzz target `checker/fuzz/check_project_naive` #152