Open rsheeter opened 4 months ago
I think this is more of an upstream libFuzzer request, because without that I don't think we can really do anything better than the splitting approach you mentioned.
FWIW, I have wanted similar things in the past for Wasmtime's fuzzing, e.g. one stream of data to use to configure various engine options on and off, and a second data stream to interpret as binary Wasm modules directly. Can of course do the splitting, or treat the first N bytes as the config and the rest as Wasm, but that means that we can't just drop a Wasm binary into the corpus for seeding purposes.
I think this is more of an upstream libFuzzer request, because without that I don't think we can really do anything better than the splitting approach you mentioned.
In trying to file an upstream issue I found that https://llvm.org/docs/LibFuzzer.html says that the authors have stopped working on libFuzzer and moved on to Centipede. That seems to make my odds of libFuzzer changes low as it's hard to construe this as a bug. Centipede says it's now https://github.com/google/fuzztest. That repo looks active. Is Rust fuzzing atop FuzzTest potentially a thing?
You can link to anything that exposes the (fairly simple) libfuzzer API: https://github.com/rust-fuzz/libfuzzer#linking-to-a-local-libfuzzer
There are a couple fuzzing engines that do this. I don't know anything about FuzzTest in particular.
In general, we can't move away from libFuzzer to something that is more-actively developed for the cargo fuzz
ecosystem (at least by default, since there are escape hatches, like the link above) until the story for $ALTERNATIVE
and OSS-Fuzz (for example) is as good as it is for libFuzzer today.
And FWIW, while there are certainly some features I'd love to see in libFuzzer (like this issue), it does its job very well.
To fuzz font processing, such as loading glyph outlines, we would like to have two inputs:
data: &[u8]
, mutated from a corpus entryIf I simply carve an Arbitrary off the incoming data, say taking the head and considering the tail to be a font binary, then tail becomes very unlikely to be a valid font. Full disclosue: I initially did exactly this; coverage of the target code remained very low.
Thinking "aloud" I suppose I could glue extra bytes onto corpus entries to use to populate my Arbitrary?