rust-fuzz / arbitrary

Generating structured data from arbitrary, unstructured input.
https://docs.rs/arbitrary/
Apache License 2.0
734 stars 76 forks source link

workaround to uses of arbitrary_loop causing non-determinism #199

Open 0xalpharush opened 2 months ago

0xalpharush commented 2 months ago

I am implementing Arbitrary by hand and using it with cargo-fuzz. Ideally, I'd like to allow the fuzzer to control the length but I wanted to experiment with different upper bounds before picking one in the harness. However, I noticed that I could not re-use the corpus given changing the upper bound introduced non-determinism, and I'd like to be able to simply insert/ extend from the corpus instead of starting from scratch. Is there an alternative pattern to support looping being configurable without affecting how the rest of the input is interpreted?

For example, I have something in my harness very similar to https://github.com/bytecodealliance/wasm-tools/blob/main/crates/wasm-smith/src/core.rs#L1070 where I am modifying u.arbitrary_loop(Some(1), Some(100),. to u.arbitrary_loop(Some(1), Some(1000),.

fitzgen commented 2 months ago

Unfortunately this is a fairly fundamental limitation of arbitrary's approach. You could do things like always consume 100 bytes of input, create a new Unstructured of those 100 bytes, and do your arbitrary loop with that sub-Unstructured but it is pretty clonky and maybe wastes bytes or maybe 100 bytes isn't enough, etc...

0xalpharush commented 2 months ago

Thanks, I will have to re-think my approach and consider how to seed the fuzzer more effectively