Join - Githubissues

gasche commented 6 years ago

I would like to test a library that provides type-directed combinators ('a encoding) for serialization and deserialization. The natural way to crowbar this is to build a generator of pairs at the type ('a gen * 'a encoding), and then test that any generated value will pass serialization->deserialization unchanged.

Dealing with such nested generators requires monadic structure, either with a join : 'a gen gen -> 'a gen function or a more traditional bind, which are currently absent from the Crowbar interface. This PR adds those two functions, plus an example demonstrating this use-case (testing type-directed combinators for serialization) in a toy case.

Remarks:

I understand that people tend to over-use the monadic structure when it is available, combining bind and return when they should have used map instead (which gives more freedom to the implementor by making it possible to statically explore the generator's shape). I could try to write a comment to encourage people not to use these new functions unless they are dealing with nested generators, or I could remove bind and expose only join, which is more explicit. (I started with only join, but it turns out that my example is nicer with bind.)
I was a bit confused by the size parameter in the generate function: a comment explaining what it is doing at the top would have been nice (right now there is a comment in the middle of the function that looks (understandably) confused itself). I just use size - 1 in my two recursive calls, which ensures termination but may not be the right thing either. (Maybe (size / 2) in both cases?) Explaining reasons for choosing something could be nice. (For example, if there are several recursive calls, can we recommend that the sum of their parameters equal the initial size value?).
I was also a bit confused by what the printer is supposed to do, so I improvised the idea that it was just for debugging, and chose (%a) => %a (first the generated generator, then the generated value).
I adapted the (also lacking a specified purpose) minimize_depths function by considering that Join should be filtered out whenever Map is filtered out. One could even consider adding a fourth filtering category where Map is allowed, but Join is not.

dinosaure commented 6 years ago

This PR could be very interesting for encore btw.

stedolan commented 6 years ago

Yes, I (finally!) agree this is useful. Crowbar had bind for a while, then it got removed when we didn't see a use for it, but this is a compelling example.

I agree that bind is more natural than join here. I'm not convinced there's much point to join.

I do think it's important to discourage use of bind when map suffices. There's an alternative fuzzing engine in the hype branch (named after Hypothesis, from which it borrows some ideas), which makes heavy use of the static structure of generators, allowing it to mutate test cases by e.g. swapping subcases that are built with the same generator. This doesn't work with bind, because of the lack of static structure, so usual monadic code like a >>= fun a -> b >>= fun b -> ... will perform much worse than map [a; b] (fun a b -> ...).

The technique I tried to discourage use was choosing a longer name:

val depending_on : 'a gen -> ('a -> 'b gen) -> 'b gen

but some explanatory comments would be useful.

gasche commented 6 years ago

Regarding the name, what about dynamic_bind?

I can try to write some documentation to explain clearly why this operator should be discouraged.

I will also review my implementation to see if having a Bind combinator (instead of defining bind from a primitive Join) makes things better (eg. in terms of having a nice printing behavior by default) or, on the contrary, more complex (the latter would be my guess). There is also the question of whether such a Bind should get an n-ary interface like Map, but I would rather bet that the simple unary interface suffices for an uncommon operator.

gasche commented 6 years ago

@stedolan I updated the PR. I used dynamic_bind, and wrote an overly-verbose documentation comment to discourage its use. Feel free to reword it (after a merge, or by writing a suggested rewording here). In the end I did use a Bind constructor, as the choice of derived printer is actually more natural for Bind than for Join -- but the printing does not change.

I also restructured the example a bit, by adding a printer for generated type representations and a printer for generated values, as I found these two features very useful in my actual use-case, and I would like to encourage users to take inspiration.

Before, tweaking the PR to break the serializer or deserializer would print the following test failure (for example):

When given the input:

    (#3 #3 #1 ?) => [[true; false; false]]

the test failed:

    different

now it prints

When given the input:

    (List(List(Bool))) => [[true; false; false]]

the test failed:

    [[true; false; false]] != [[false; false; true]]

stedolan commented 6 years ago

Thanks! (Apologies for the delay in merging)

stedolan / crowbar

Join #36