near / borsh

Binary Object Representation Serializer for Hashing
https://borsh.io/
470 stars 40 forks source link

Fuzz testing #30

Open MaksymZavershynskyi opened 4 years ago

MaksymZavershynskyi commented 4 years ago

We need to write the following fuzz tests for borsh:

A) Generate random type. Creating an object of the type filled with random data. Then serialize it and deserialize it, and compare that structure before and after are the same; B) Generate random type. Creating an object of the type filled with random data. Serialize it. Randomly flip a subset of bits in the serialized structure. Try deserializing it and assert that it does not panic, but instead either deserializes or returns an error.

The two difficult things to implement would be:

As an option, I suggest we do both using procedural macros. We can have a macro random_type!(Name, X, Y, seed) that generates a token stream corresponding to a declaration of some type Name using https://doc.rust-lang.org/reference/procedural-macros.html#function-like-procedural-macros where X would be the max depth (e.g. if we have nested structures) and Y is the max width of each node (e.g. max number of fields in a struct or max number of variants in an enum).

Each type would also be decorated with #[derive(RandomInit)] which implements trait

trait RandomInit {
random_init() -> Self
}

for the type, just like we do with serializers. We then would implement RandomInit for basic types and collections, just like we do with serializers.

Then our test would be something like:

random_type!(T0, 1, 1, 42);
...
random_type!(T42, 10, 12, 42);

#[test]
fn test0() {
  for _ in 0..100 {
   let t0 = T0::init_random();
   let out_t0: T0 = try_from_slice(&t0.try_to_vec().unwrap()).unwrap();
   assert_eq!(t0, out_t0);
  }
}

Note should also look at the fuzzing tools that sigma prime wrote for our borsh, we might not need to write it ourselves.

MaksymZavershynskyi commented 4 years ago

I looked at what sigma prime did with load testing. They created 31 Rust types of relatively simple format (30 types have only one field/variant and are not nested). Then they try to deserialize these types from a randomly generated array of bytes. I think this is a good starting point for fuzz testing, but we need to go further:

Also, we need to serialize objects initialized with different data, see above.

ilblackdragon commented 4 years ago

Also would be good to have at least some tests that test serializing with JS and deserializing with Rust and the other way. Also we will get more languages here (already have Python, prob Go and C# going to be next), so would be good to have some generic way of testing any set of serializers/deserializers.

lexfrl commented 4 years ago

Generating random type;

We can generate an input for a random nested type, but to generate a proper deserializer we would need to compile it (generate a program, macros will be not enough)..

A main goal of fuzzing is to test that your program will not crash under unexpected input, I'd focus on in and then we could expand it (if needed).

MaksymZavershynskyi commented 4 years ago

We can generate an input for a random nested type, but to generate a proper deserializer we would need to compile it (generate a program, macros will be not enough)..

Yes, I suggest we have a bash script that runs two binaries: