enhance the fuzzing algorithm to be competitive with other mainstream fuzzers

andrewrk commented 4 months ago

Extracted from https://github.com/ziglang/zig/pull/20773.

In the initial implementation of fuzzing, I threw together something rough and quick that was able to find a string used with mem.eql. However, this is far from being competitive.

It doesn't take much to be competitive. AFL is only 10K lines of code, and it's all open source. Some guy just sat down and threw some paint at the wall to see what sticks, and you know what, anyone can do that. So let's also do it. We'll probably come up with some novel ideas, as well as plenty of silly ideas. We can keep the best, toss the rest, and then steal whatever good ideas are leftover from all the other open source fuzzers out there. The more source code I read from AFL and libFuzzer, the more confident I am that we can beat these projects on every metric simultaneously.

This issue is open-ended, however, in order to close it, we should be able to run zig's fuzzer side by side with other mainstream fuzzers on many of the same software test cases, and provide a comparison of their efficiency with regards to finding bugs and exploring the state space.

Probably, solving #352 will greatly aid this issue because it will provide insight into how well the state space is being explored, as well as just being really satisfying to watch.

Despite being an area of research, this is actually quite a contributor-friendly issue because it is well-scoped and I have already hooked up all the components so you can start trying out stuff already just by making edits to fuzzer.zig and rerunning zig build --fuzz (perhaps also with --debug-rt).

This is marked as 0.14.0 milestone because I want to use this feature to fuzz test incremental compilation, which is the main goal of this release cycle.

20803

ProkopRandacek commented 4 months ago

I am interested in working on this.

Note that mainstream fuzzers (checked afl, afl++, angora) use custom llvm passes instead of just the coverage pass.

Probably, solving https://github.com/ziglang/zig/issues/352 will greatly aid this issue

The problem with coverage is that it has a slightly different goal. For example:

// ...
if (a) {
    // ...
}
// ....

The coverage information usually does not contain information about the if branch not being taken since from the line coverage point of view, there are no lines to mark as red when the if is false. Some fuzzers insert dummy basicblocks to solve this issue and keep using the coverage information.

IMO the better option is to do what angora is doing and provide custom instrumentation that stores triples (source BB, target BB, call stack context). (See the paper for more details)

andrewrk commented 4 months ago

trace-pc does this already. No custom instrumentation needed.

ziglang / zig

enhance the fuzzing algorithm to be competitive with other mainstream fuzzers #20804

20803