Research Project: `derive` Fuzz Testing

jswrenn commented 9 months ago

Co-authored with @joshlf.

Overview

Create a library for fuzz-testing proc macro derives.

Background

Zerocopy is a crate that provides safe abstractions over transmutation (i.e., reinterpreting the bits of a type as if they belong to another type). Zerocopy provides four core traits, each of which can only be derived for a type with a procedural macro (e.g., #[derive(FromZeroes)]):

FromZeroes indicates that a sequence of zero bytes represents a valid instance of a type
FromBytes indicates that a type may safely be converted from an arbitrary byte sequence
AsBytes indicates that a type may safely be converted to a byte sequence
Unaligned indicates that a type’s alignment requirement is 1

When a user derives one or more of these traits for their types, zerocopy must prove that the properties associated with the traits actually hold. It does so in two stages. First, zerocopy analyzes the syntax tree of the type definition. If any required elements are missing (e.g., the type is not annotated with an appropriate #[repr(...)]), zerocopy produces an error that halts compilation.

Otherwise, zerocopy proceeds to emit both the requesite trait implementation and a type-level proof of soundness. For instance, for a type to be soundly FromBytes, each of its fields must also be FromBytes. Zerocopy emits code that, at type-checking time, asserts that each field implements FromBytes.

We currently use a small number of UI tests (using the trybuild crate) to assure ourselves that these analyses are correct. For each test, we craft a stand-alone Rust file that contains an unsound derive for a hand-written type definition. Our testing harness compiles each of these files, and confirms that the expected compilation error is produced.

For code that is known to compile, we also use miri, a Rust interpreter, to run the code and detect undefined behavior.

Motivation

Zerocopy's current UI testing approach offers a high degree of control (e.g., we are able to track minute changes in error messages), but only with a large amount of labor. It is sufficiently difficult to create and maintain these tests that zerocopy does not have many of them.

Also, as with any codebase, zerocopy's UI tests only test for error conditions that have occurred to us to add tests for. As a result, some error conditions slip through the cracks, and sometimes this in turn allows bugs to slip through the cracks that could have been caught with more thorough testing such as in https://github.com/google/zerocopy/pull/672.

To remedy this, we would like to augment our small set of fine-grained, hand-written UI tests with a large, dynamically-generated set of coarse-grained UI tests.

Design

We would like to write fuzz tests using the cargo-fuzz testing framework. A zerocopy fuzz test will randomly generate a Rust datatype, derive zerocopy traits for that datatype, and then use miri to run methods from those traits. The test passes if this process produces either a compile error, or runs under miri-successfully. It fails if miri detects unsoundness.

A sample cargo-fuzz fuzzing harness might look like this:

#![no_main]
#[macro_use] extern crate libfuzzer_sys;
extern crate arbitrary_typedef;

use arbitrary_typedef::AdtDef;

fuzz_target!(|adt_def: AdtDef| {
    assert!(compile_error_or_miri_success(format!(r#"
        use zerocopy::FromZeroes;

        #[derive(FromZeros)]
        {adt_def}

        fn main() {
            let value = FromZeroes::new_zeroed();
        }
    "#);
});

For this, we need to define:

arbitrary_typedef::AdtDef, which abstractly describes a data type definition, and implements Arbitrary for it, allowing AdtDef to be automatically generated.
compile_error_or_miri_success, a function that compiles its argument, produces true if it compile-errors, otherwise runs it under miri, and produces true if it doesn't fail (or otherwise produces false).

The first item is the primary research challenge: How do we randomly generate interesting (compositions of) Rust datatypes?

Related Work

PLT Redex's generate-term randomly generates a programming language 'term' of a given size.

jswrenn commented 9 months ago

Assigning this to @maemre!

maemre commented 9 months ago

It took me a while to collect different approaches we might take. I'm sorry about the delay.

I think using Arbitrary (anything QuickCheck-like) is a good start. I have a few questions to define the scope:

Are there any context-dependent features for FromZeroes to work? To elaborate, are there any constraints that need to be maintained among different parts of the ADT that we wouldn't capture with a context-free grammar (e.g., any lifetime bounds shared by two types).
Are we looking for exercising particular types more than others? In that case, we can use something like a weighted tree.

Beyond Arbitrary, there are also these pieces of relevant work:

MiMIs implement an efficient way to generate structured data that has complex invariants. The authors use it to generate ASTs too. I can look into porting it to Rust if need be.
Another relevant tool is AFLSmart which is an extension of AFL that always produces mutants that are valid according to a grammar.

These test generation tools would be useful for generating code with additional constraints. For example, we can try generating data structure definitions that should always compile, so that we can also catch bugs where the derive macro produces code that doesn't compile.

joshlf commented 9 months ago

I updated the issue text to mention this, but I'll put it here too for more visibility: fuzzing could have prevented the bug that is fixed in https://github.com/google/zerocopy/pull/672.

joshlf commented 4 months ago

Credit to @glpesk for this idea

We could seed the fuzzer using types scraped from existing codebases, such as those which are public on GitHub.

google / zerocopy