rust-fuzz / arbitrary

Generating structured data from arbitrary, unstructured input.
https://docs.rs/arbitrary/
Apache License 2.0
733 stars 75 forks source link

Add a `check` submodule that provides a mini property-based testing framework #191

Open fitzgen opened 3 months ago

fitzgen commented 3 months ago

This module should be gated behind a check cargo feature.

It should provide a really simple property-based testing framework for combining Arbitrary-based generators with your oracles/property functions that can be used as smoke tests in local development and on CI, where it is often not always feasible to run the full cargo fuzz targets for a while due to various constraints (build time, length of time required to run, etc...). It would not be intended to replace long-running fuzzers leveraging coverage-guided feedback in the background, it would be a compliment to them for the local quick-test and CI uses previously described.

This is something I've wanted for a while, because I'm tired of doing it crappily by hand, but have obviously never gotten around to.

But, at some point, @matklad made the arbtest crate which is exactly what I'd been imagining, it's fantastic. Thank you very much for creating this, @matklad!

So my purposes for opening this issue are to gauge two things:

  1. Other arbitrary maintainers: how do you feel about having such a feature/module in this crate?
  2. @matklad: how do you feel about merging arbtest into this crate, as arbitrary::check?

The primary advantages I see of moving arbtest into arbitrary::check, instead of maintaining the current state with separate crates are:

So, what do y'all think?

matklad commented 3 months ago

Yes please! I’d be delighted to not have to maintain arbtest, especially given that porting it to Zig for TigerBeetle is on my todo list!

I guess my only concern here is that there’s ecosystem pressure to add macros as an entry point to property tests, and to also pass T: Arbitrary rather than just unstructured directly (eg, I think cargo-fuzz works that way).

The macro-less API of arbtest that takes a raw unstructured is very intentional. This is required if you do distributed-systems-like testing where you don’t generate input up-front, but rather, in a loop, react to the current state of the system.

So, I would strongly prefer if this peculiarity of the API is preserved (but even if it isn`t no big deal, I can write some code of mine to that effect).

matklad commented 3 months ago

One other thing here: the shrinking in arbtest is criminally dumb (the input stream of bytes is just discarded, there’s no attempt to tweak it locally), so it might be good to spend some effort investigating whether it can be made 10x more efficient with little effort.

https://github.com/jlink/shrinking-challenge is one set of benchmarks there (hat tip to @stevana for that!)

fitzgen commented 3 months ago

I’d be delighted to not have to maintain arbtest

You'd be very welcome to continue helping out with arbitrary::check when you have cycles, if we go down this route :)

The macro-less API of arbtest that takes a raw unstructured is very intentional.

I think maintaining a macro-less API is very doable.

We should be able to do something like

pub fn check_raw(property: impl Fn(&mut Unstructured) -> Result<()>) {
    // ...
}

pub fn check<A>(property: impl Fn(A) -> Result<()>)
where
    A: Arbitrary,
{
    check_raw(|u| property(u.arbitrary_take_rest()?));
}

Although, at the risk of opening the bike shedding, I think we want to use a Check builder where you can do things like configure the length of time to run checks or a minimum number of iterations, and then with Check::{run,run_raw} methods to actually run the check once it is done being configured.

the shrinking in arbtest is criminally dumb ... so it might be good to spend some effort investigating whether it can be made 10x more efficient with little effort

Would be neat if someone could investigate this, but I personally don't have the cycles for it.