google / proto-lens

API for protocol buffers using modern Haskell language and library patterns.
https://google.github.io/proto-lens
BSD 3-Clause "New" or "Revised" License
465 stars 110 forks source link

Differential fuzzing for proto-lens #280

Open blackgnezdo opened 5 years ago

blackgnezdo commented 5 years ago

As we are improving the speed of proto-lens we are incurring more complexity tax. We should start seriously thinking about a test suite that would be more adversarial (and real world-like).

One idea from @kcc was to create a differential fuzzing test. The idea here is to have two parallel implementations, e.g.

haskellReparse :: ByteString -> Maybe ByteString   -- implemented in Haskell
cxxReparse :: ByteString -> Maybe ByteString  -- implemented in C++

Both functions would parse the given byte string (or fail) and then serialize the results back. The fuzzer driver will then invoke an asssertion function that e.g. uses proto diff to confirm the results match. The magic part here comes from the fuzzer driving corpus generation by collecting coverage. Even though Haskell doesn't normally have coverage (at least I've never heard GHC LLVM backend supporting coverage generation), still the C++ branch is coverage-enabled and the fuzzer framework will exploit it for corpus generation.

An example of such code is this cross-checking test between openssl and libgrypt.

judah commented 5 years ago

This is an interesting idea. Are you also thinking of having the fuzzer generate the .proto files themselves? (Or, I guess equivalently, the DescriptorProto structure.) If so, are you aware of any existing implementations? I found one, but it's in Python: https://github.com/trailofbits/protofuzz

There's also a few known edge cases where we do differ from the C++ implementation: https://github.com/google/proto-lens#current-differences-from-the-standard This task could be a motivation to finish them off.

blackgnezdo commented 5 years ago

This is an interesting idea. Are you also thinking of having the fuzzer generate the .proto files themselves? (Or, I guess equivalently, the DescriptorProto structure.) If so, are you aware of any existing implementations? I found one, but it's in Python: https://github.com/trailofbits/protofuzz

That's probably going too far as it requires compiling generated C++.

There's also a few known edge cases where we do differ from the C++ implementation: https://github.com/google/proto-lens#current-differences-from-the-standard This task could be a motivation to finish them off.

Great, we want to eventually be compatible.