planetis-m commented 2 years ago

Introduction

This proposals seeks to describe the architecture of a structure aware fuzzing library for Nim. Two options are considered, one powered by libFuzzer. It's implementation borrows heavily from google/libprotobuf-mutator but without using protobuf. Instead it uses a simpler serialization format that translates directly to Nim types. The other weaker one, similar to rust-fuzz/arbirtary is not recommended. It's performance was evaluated during tests and it was found to be subpar.

By different accounts grammar-based fuzzers outperform mutation based ones. This is an attempt to fuse them, a way to do user-provided smart mutations based on context, along with unguided, more general-purpose library provided ones.

Serialization format

Serialize object to bytes, do not use variant length formats for integers. Object fields are expected in order. Refs and Options use a bool to signal if it's empty or not. For dynamically-sized types, seqs, strings, Tables, represent the length as an int32, followed by the stored elements.

A way to inspect the input as human readable format might be needed. Does $ not suffice? Maybe for debuging invalid input, that the reader rejects? In proto fuzzers in chromium source, native input is also dumped.

Is a human readable format needed? Use any of the available JSON libraries.

Seed corpus

Just a question of serializing to file valid user provided input. Problem: the post-processor callbacks (if implemented) modify that input. So should toBytes run them? https://www.youtube.com/watch?v=ID8XtoMn43I (claims that empty seed leads to discovering more bugs). Changing the fuzzer invalidates the corpus (same as LPM when removing fields).

In the unexplored field of automatically generating seed inputs from usage, an idea worth exploring is using macro transformations that gather structured traces and work similar to the existing coverage macros available (coverage, hcoverage).

Benchmark fuzzer

Diverse tests cases, a control (would that be LPM?) and statistical measurements required. Artificial tests or real libraries (inject bugs?)? What to measure, bug-count/coverage? Time to run? Take ideas from: https://www.youtube.com/watch?v=uevfxaJHvDA

Coverage includes the code paths from this lib, any pragma to prevent that?

Testing inputs

Deserialize the object from the input bytes. If the next steps are done correctly, it should just work. Else return. Then feed it to the API being tested.

Mutating data

The API requires a random engine seeded by libFuzzer's seed parameter, with that we can make changes to the input that are reproducible. Like when removing, copying or inserting elements to a seq or switching branches in a variant object.

Deserialize input and copy to output. From the consumed and maximum available bytes, get the size hint. Using random, choose a sequence of possible mutations and perform them.

In case of de-serialization failure, it may need to use a default value, possibly user-provided. - Instead always reject first runs empty input, rest should be the one provided by customMutator.

Run a single post-processor function afterwards of type: CustomPostProcessor*[T] = proc (n: var T; seed: int64) Do we need multiple callbacks (hard) or working on nested objects (doable)? First one would require a register table. Other just a recursive helper proc that calls postProcess procs for every object field. Write an example that uses state.

LPM also uses a mutations cache to increase performance, since state cannot be preserved between runs, this is saved where? (static) Let's do without the cache for a start. Since there are so many invalid test inputs should we just settle with a invalidSet?

If it fails to serialize returns empty.

Primitive types

Call libFuzzer's mutate directly, this way we are provided with "interesting" values (values that may expand the code coverage, extracted from libFuzzer's instrumentation), instead of flipping random bits ourselves.

String, (seq/array of bytes)

Call mutate as above. Valid UTF8 strings can also be provided by porting LPM's fixUtf8.

Enums

Enums work the same, it does require though a macro to handle enums with holes correctly. Prototype: https://gist.github.com/planetis-m/274cae6865813fe9510ddbdd5aafa582

Seqs, Tables, other dynamic data-structures.

Not different from what described in the section and also in "User-defined overloads". Prototype: https://gist.github.com/planetis-m/e54014bca29dd58d887e4c069bac8594

Variant objects

Prototype: https://gist.github.com/planetis-m/a157bbfcea532770ec35bfcfeddcebc0 Switch should be within mutate, needs a macro. Copy/merge also needs a macro, due to fields iterator limitations.

References

Limit recursion depth, another API addition (maxDepth)?

Distincts

Use distinct to provide custom mutators for base types that have interesting values, like file signatures (as an alternative to just letting mutate provide them) or to limit the search space (but then you might still want to allow evil values).

When not using intermediate fake types, the user has to resort to this pattern: when defined(fuzzer): field: distinct type else: field type in object declarations. This will not work if you're not in control of the fuzzed code.

User-defined overloads

Users need to specialize the generic proc mutate[T](x: var T; sizeIncreaseHint: Natural; r: var Rand). The implementation consists of a state machine that performs random mutations in a row. Would either call mutate on the items or provide special values.

The issue we are trying to prevent here, is producing mutations that don't fit in the output buffer (thus called invalid). The more invalid mutations, the longer the fuzzers runs.

To teach the fuzzer how to traverse containers, two overloads must be provided.

proc combine[T; U: not MyContainer](x: var MyContainer[T], y: U, depth: int, res: var int) =
  for v in x.mitems:
    when compiles(combine(v, y, depth, res)): combine(v, y, depth, res)

proc combine[T; U](x: var T, y: MyContainer[U], depth: int, res: var int) =
  for v in y.items:
    when compiles(combine(x, v, depth, res)): combine(x, v, depth, res)

For potentially dynamic types, you must use a recursion guard (if depth <= MaxDepth, an intdefine) and increament the depth variable. Some containers do not allow modification of their values (sets). In that case a plain copy is performed.

proc combine[T](x: var MyContainer[T], y: MyContainer[T], depth: int, res: var int) =
  select: x = y

Alternatively, implement a merge algorithm? Harder, need sizeIncreaseHint parameter and teach user about byteSize

Crossover data

Another function that needs to be provided to libFuzzer. Given two inputs, create a mutation that borrows from both with equal probability. Should use the same code as the mutation implementation above.

Need merge overload proc merge[T](dest: var T; src: T) for every type that adds src items to dest for collections In optional types src replaces dest if it's set. In any other case it always replaces dest.

Sequence of actions

Meant to test stateful API's. Examples include https://gitlab.com/wilzegers/autotest/ and the ones discussed in https://github.com/google/fuzzing/blob/master/docs/structure-aware-fuzzing.md#fuzzing-stateful-apis Needs a way to serialize procs, with their arguments. Possibly similar to variant objects.

Alternative design

Only concerns the step of testing inputs (testOneInput). Seed the random generator with the length of the input bytes. Return early if the byte input may not suffice to fill all input types.

We have no knowledge of how many bytes we might end up using in advance! Try to guess the size of dynamically-sized types with sizeHint overloads. Or extract a length from the input bytes bound to the size, which may consumes all available bytes, so use two different APIs.

Use the random engine when needed, like which branch to use in a variant object.

There are some downsides:

Users are encouraged to provide themselves sensible lower or upper limits for dynamic types. Needs overriding sizeHints for specified fields.
Will not expand fuzzing coverage as quickly, fuzzer blindly inserts interesting values in random spots (so they are wasted). Also it almost never return the exact byte size we end up using.
No post-processor step possible (really?). No way to insert dependencies between data, need user effort to guard against large allocations.

Others have similar (negative) views: https://github.com/loiclec/fuzzcheck-rs/blob/main/articles/why_not_bytes.md

On the upside, it is simpler to implement, and will work with more fuzzers other than libFuzzer. Prototype: https://gist.github.com/planetis-m/4c79f55e3040af9fb7b509809e43f342

Other options considered

gofuzz Smart fuzzer for Go, similar to the section above.

TODO

Skim the code for interesting ideas in:

AFLSmart Uses Peach pit files, works with adding, removing, splicing 'chunks'. Specific to binary formats. Assigns a percentage for invalid inputs.

Generational

FormatFuzzer fuzzes binary files, works with binary templates.
Peach Fuzzer Weird XML scheme, specialized for Fuzzers
More in ClusterFuzz (Where? Were they moved?)

planetis-m commented 2 years ago

Remaining: Need to do more tests with variant objects. WIP: https://gist.github.com/planetis-m/a157bbfcea532770ec35bfcfeddcebc0 will finish tomorrow ~~Should it use a Stream or the (ptr UncheckedArray[byte], len) type? I am in favor of the second, because the other one would uncover bugs in Stream.~~

planetis-m commented 2 years ago

Usage would look like this:

fuzzTarget(x, MyObject):
  callMyApi(x)

It's a template that produces function definitions for LLVMFuzzerTestOneInput, LLVMFuzzerCustomMutator and LLVMFuzzerCustomCrossOver, which use MyObject types. Accepts only one x: untyped parameter, second is the type.

In second time, this would provide more convenience, with multiple arguments unpacked from a tuple:

proc test(s: string, a, b, c: int) {.fuzzTarget.} =
  callMyApi(x)

zah commented 2 years ago

Structure aware fuzzing is close in spirit to property based testing, so a library like nim-quicktest can serve as an inspiration. In particular, the library offers an interesting DSL for defining additional constraints over the input parameters, which you can potentially reuse.

Perhaps this project can be delivered as a fork of nim-quicktest, aiming to add libFuzzer as a custom execution engine.

So, let's start with a simple macro taking a do block. As a first step, I'd like to see the end-to-end integration with libFuzzer and a simple nims script that is able to start the fuzzing process (similar to the one we have right now).

The initial version can have a very rudimentary version of the serialization library and the mutator (supporting only extremely basic types and aiming to demonstrate only the interfaces between the various components). Once this is in place, we can start expanding the capabilities of the input DSL, the mutator and the list of supported types in the serialization library in parallel.

planetis-m commented 2 years ago

quicktest looks nice but the constraints are bit over the top. It reminds me of pattern matching but on proc parameters! For one it would be nearly impossible to declare object variants which are the most important types I am trying to support! They would be paramount to testing stateful APIs as they can be used to represent them.

But declarative constraints even on type sections are limiting. The worst that can happen is to end up with too many pragmas. With a post processing step one just writes procedural code.

That also makes me favor directly working with Nim types instead of an intermediate structure (like JsonNode or whatever capn'proto or protobuf libraries use). Obviously the user needs to provide a converter to native objects before consuming the test input. (I realize that's a bad example since you would need that either way to support SQL statements.) But it limits you in the sense that protobuf is stateless. You would need to introduce fake abstractions.

I have been eyeing a paradigm similar to jsony which uses not one, but multiple hooks that allow the user to control the mutated type's state. Another alternative is giving the user a bunch of API calls like mutate[T] and helper templates and let them compose their own custom mutator, granting them full control of the process. I need to experiment with more real test cases in order to choose. I hope the first one is enough.

An obvious advantage of this approach is to provide tests on different 'levels' of the library. Think of the compress example but working on structured data.

As an appeal to authority watch this https://youtu.be/U60hC16HEDY?t=720 In particular where he explains that for fuzzing clang, a hand written mutator is the way to go. Making a custom mutator is a lot of work and with this library the main aim is to make it easier! We need this over other approaches in order to make better use of libFuzzer's output. The prototypes I posted so far prove that point since we reach the abort() in less than 100 thousand mutations. With poor utilization it can go on for some billions and without any instrumentation it runs forever. (Here I am painting rust-fuzz specifically which does a poor job - it read a bool from byte and a type for seqs/strings and length from the last bytes when consume all, which doesn't seem to work that well.)

Also consider that reinventing libprotobuf-mutator just in a different format doesn't add any value! It's just a lot of effort!

planetis-m commented 2 years ago

Outdated info, closing.

status-im / nim-testutils

Structure-aware fuzzing proposal #35

Introduction

Serialization format

Seed corpus

Benchmark fuzzer

Testing inputs

Mutating data

Primitive types

String, (seq/array of bytes)

Enums

Seqs, Tables, other dynamic data-structures.

Variant objects

References

Distincts

User-defined overloads

Crossover data

Sequence of actions

Alternative design

Other options considered

TODO

Generational