stedolan / crowbar

Property fuzzing for OCaml
MIT License
180 stars 31 forks source link

Join #36

Closed gasche closed 6 years ago

gasche commented 6 years ago

I would like to test a library that provides type-directed combinators ('a encoding) for serialization and deserialization. The natural way to crowbar this is to build a generator of pairs at the type ('a gen * 'a encoding), and then test that any generated value will pass serialization->deserialization unchanged.

Dealing with such nested generators requires monadic structure, either with a join : 'a gen gen -> 'a gen function or a more traditional bind, which are currently absent from the Crowbar interface. This PR adds those two functions, plus an example demonstrating this use-case (testing type-directed combinators for serialization) in a toy case.

Remarks:

dinosaure commented 6 years ago

This PR could be very interesting for encore btw.

stedolan commented 6 years ago

Yes, I (finally!) agree this is useful. Crowbar had bind for a while, then it got removed when we didn't see a use for it, but this is a compelling example.

I agree that bind is more natural than join here. I'm not convinced there's much point to join.

I do think it's important to discourage use of bind when map suffices. There's an alternative fuzzing engine in the hype branch (named after Hypothesis, from which it borrows some ideas), which makes heavy use of the static structure of generators, allowing it to mutate test cases by e.g. swapping subcases that are built with the same generator. This doesn't work with bind, because of the lack of static structure, so usual monadic code like a >>= fun a -> b >>= fun b -> ... will perform much worse than map [a; b] (fun a b -> ...).

The technique I tried to discourage use was choosing a longer name:

val depending_on : 'a gen -> ('a -> 'b gen) -> 'b gen

but some explanatory comments would be useful.

gasche commented 6 years ago

Regarding the name, what about dynamic_bind?

I can try to write some documentation to explain clearly why this operator should be discouraged.

I will also review my implementation to see if having a Bind combinator (instead of defining bind from a primitive Join) makes things better (eg. in terms of having a nice printing behavior by default) or, on the contrary, more complex (the latter would be my guess). There is also the question of whether such a Bind should get an n-ary interface like Map, but I would rather bet that the simple unary interface suffices for an uncommon operator.

gasche commented 6 years ago

@stedolan I updated the PR. I used dynamic_bind, and wrote an overly-verbose documentation comment to discourage its use. Feel free to reword it (after a merge, or by writing a suggested rewording here). In the end I did use a Bind constructor, as the choice of derived printer is actually more natural for Bind than for Join -- but the printing does not change.

I also restructured the example a bit, by adding a printer for generated type representations and a printer for generated values, as I found these two features very useful in my actual use-case, and I would like to encourage users to take inspiration.

Before, tweaking the PR to break the serializer or deserializer would print the following test failure (for example):

When given the input:

    (#3 #3 #1 ?) => [[true; false; false]]

the test failed:

    different

now it prints

When given the input:

    (List(List(Bool))) => [[true; false; false]]

the test failed:

    [[true; false; false]] != [[false; false; true]]
stedolan commented 6 years ago

Thanks! (Apologies for the delay in merging)