Closed yallop closed 7 years ago
So, Crowbar already provides a monadic interface in more-or-less this style, plus some other junk. I'll try to explain what the other junk is for, but I'm open to alternative designs.
The (>>=)
monadic operation can be separated into (arguably) more primitive map
and join
operations, so we can also write the basic monadic interface like this:
type +'a t
val return : 'a -> 'a t
val map : ('a -> 'b) -> 'a t -> 'b t
val join : 'a t t -> 'a t
By far the most common application is to define a generator as a function of the result of a few others, for instance, defining a generator of {x : int; y : int; z : int}
in terms of one for int. This can be done using >>=
:
let xypair = int >>= (fun x -> int >>= (fun y -> int >>= (fun z -> {x; y; z})))
but I think that the cleanest option (and the one that requires the least nesting) is to generalise map
to multiple arguments:
let xypair = map (fun x y z -> {x ; y ; z}) [int; int; int]
(Getting this to work involves mild GADT trickery abusing the ::
constructor, but I think it's quite clear in use even if the internals are a bit arcane. In Crowbar, I swap the arguments, which makes things more readable by putting the generators int; int; int
close to the binders x y z
).
To the basic monadic interface, we can add a bunch of derived combinators, e.g:
val option : 'a t -> 'a option t
val list : 'a t -> 'a list t
val choose : 'a t list -> 'a t
I strive to avoid combinators like <*>
and <|>
which over-emphasise arity 2 - you end up spending far too much time disentangling commas in a, (b, (c, d))
. Instead, I prefer combinators like choose
and the multiple-argument map
which work just as cleanly for arity 3, 4, 5 as for 2.
At this stage, there's a simple monadic core (based on join and multi-argument map, from which >>=
is a derived and not-too-often-used combinator), on top of which option
, list
, and the like are easy defined.
But in order to write generators for interesting structures, we need to handle recursion. OCaml refuses to compile the following:
let rec expr = choose [
map [int] (fun i -> `Int i);
map [expr; expr] (fun e1 e2 -> `Plus e1 e2)
]
because the right-hand side could do anything. The cheap trick that Crowbar uses currently is to lift the builtin forms for constructing generators to constructors, since OCaml happily accepts:
let rec expr = Choose [
Map ([int], fun i -> `Int i);
Map ([expr; expr], fun e1 e2 -> `Plus e1 e2)
]
This particular example could be expressed cleanly enough with fixed-point combinator, something like:
val fix : ('a t -> 'a t) -> 'a t
However, real-world examples are not so simple, and involve complex mutual recursion. While mutual recursion can be expressed using a single fixpoint combinator, uses of that fixpoint combinator are not pretty. By contrast, this partial generator for OCaml parse trees looks very like the type definitions that it's generating.
Still, it's definitely unsatisfying that Crowbar gives e.g. List
a higher status than anything user-definable (You can get most of the same effects with Join
and a custom generator, but not quite all). So, I'm definitely interested in alternative designs, but I consider it a requirement that it be possible to express complicated mutual recursions with code that's no uglier than at the moment (that is, it's not enough to have a fixpoint combinator which in principle expresses everything - that expression must also be readable).
I have a feeling that this can probably be done with creative abuse of Lazy
, since that's the other means (as well as constructors) of sneaking code into let rec
, but I haven't worked out the details yet.
FWIW react's signal fix point combinator signature allows to express complex mutual recursion, you just have to let the user escape arbitrary info from the fix point function. However it is horrible to write and use. I have been thinking about maybe replacing this with a delay : 'a S.t lazy -> 'a S.t
combinator.
For fix points, I found the (somehow verbose) mu*
functions in Depyt easy to use in general: https://samoht.github.io/depyt/doc/Depyt.html#VALmu, e.g:
type r = { foo: int; bar: string list; z: z option }
and z = { x: int; r: r list }
(* Build the representation of [r] knowing [z]'s. *)
let mkr z =
record "r" (fun foo bar z -> { foo; bar; z })
|+ field "foo" int (fun t -> t.foo)
|+ field "bar" (list string) (fun t -> t.bar)
|+ field "z" (option z) (fun t -> t.z)
|> sealr
(* And the representation of [z] knowing [r]'s. *)
let mkz r =
record "z" (fun x r -> { x; r })
|+ field "x" int (fun t -> t.x)
|+ field "r" (list r) (fun t -> t.r)
|> sealr
(* Tie the loop. *)
let r, z = mu2 (fun r z -> mkr z, mkz y)
let xypair = int >>= (fun x -> int >>= (fun y -> int >>= (fun z -> {x; y; z})))
It's possible to omit the parentheses here, which makes the code a bit more readable -- rather like do
notation:
let xypair = int >>= fun x ->
int >>= fun y ->
int >>= fun z ->
return {x; y; z}
but I think that the cleanest option (and the one that requires the least nesting) is to generalise map to multiple arguments:
let xypair = map (fun x y z -> {x ; y ; z}) [int; int; int]
Another possibility, which I personally find quite readable, is to use applicative style (defineable in terms of the monadic combinators), like this:
let xypair = pure (fun x y z -> {x ; y ; z}) <$> int <$> int <$> int
where
let pure = return
let (<$>) f x = f >>= fun g -> x >>= fun y -> return (g y)
I think there are good arguments for all three notations. And (ignoring the issues with let rec
for the moment), I'd think it would be possible to define all three on top of a monadic core, although I haven't actually checked for Map
.
@samoht
For fix points, I found the (somehow verbose) mu* functions in Depyt easy to use in general
I tend to agree with @dbuenzli that these are "horrible to write and use", at least at scale. Depyt provides only mu
and mu2
(for which the syntax noise isn't awful), but the example of a generator of valid OCaml parsetrees has twelve mutually recursive generators. This is quite easy to read with let rec
, and mirrors the definitions in parsetree.mli
, but would be awful with mu12
.
@yallop
Yes, a simple monadic core would be good, but I don't think that return
+ >>=
is any simpler a core than pure
+ map
+ join
. (I recall a long-running argument in Haskell-land about this :) ). Multi-argument Map
is exactly Applicative
, Map
and (pure
, <$>
) can express each other, and a monad is a functor with pure
and join
.
A bigger issue is that Crowbar is currently defined by a big fixed syntax tree, when as you point out an abstract type and some primitives would be better and easier to extend in user code. What's blocking this at the moment is a good way of defining complicated mutually recursive data types. I don't think this problem becomes any easier if we replace map
/ join
with return
/>>=
.
OK, handling recursive generators using lazy
was easier than I imagined. Opinions on the API in #5?
Fixed by #5
I'm very much looking forward to using this library across quite a few projects! The first one I have in mind is profuse, where the central data structure has some interesting invariants, fairly straightforward to express, but tricky to check. Crowbar seems like an ideal tool for such cases. I'm also looking forward to using it in a serialisation/rpc library under development at the moment.
In trying to understand the design of crowbar, I've made a few notes about things that it might be useful to express slightly differently. These are currently just suggestions, sometimes quite rough, but if you like the sound of any of them, I'm happy to turn them into pull requests. This first issue is mostly about implementation, but I think it spills over into the interface a bit, too.
datatype-centric vs compositional design
The central
gen
type is currently defined as a datatype, with constructors for lists, options, non-empty lists, etc. There are sometimes reasons to be cautious about defining things this way, so I'd like to suggest exploring an alternative approach.Since there's essentially only one operation defined over the
gen
type, it should be straightforward to define it in a more compositional/extensible style. (This is really just the old distinction between oo-style, which makes it easy to add new types and hard to add new operations, and typed-fp-style, which makes it hard to add new types and easy to add new operations.)In other words, rather than a definition of this form
along with a function like this, where generation is defined by case analysis:
I think it'd be worth considering a definition more like this:
along with a separate function for each constructor
There are three main benefits to doing things this way.
First, it's easy to add new constructors, either inside or outside the library, since
list
andoption
are just functions with no special status rather than constructors of a distinguished types.Second, it keeps things compositional. With a datatype definition it's always tempting to do case analysis on subvalues, which tends to lead to uneven behaviour. (Indeed, the current definition of the
generate
function does inspect subvalues in at least theChoose
case.) With the separate-function design it's essentially impossible to analyse subvalues, so the semantics is almost certain to end up cleaner / more "denotational".Third, it keeps the core small. With the separate-function design the core can be tiny (see below), which makes it easier to make interesting changes to it, and easier to ensure that the core is correct. It's also a useful test of the design: if functions like
list
andoption
can easily be built on top of the core then it's likely that almost any other user-defined function can be built that way, too; on the other hand, if you start with a larger core, with everything you need provided as a built in then there's nothing to ensure that the langauge will be easy to extend outside the library, and it's more likely that you'll have to extend the core with additional operations later.design sketch
Here's a sketch of such a design. (It's entirely possible that I've misunderstood some details -- if so, I'm keen to understand which -- but I expect the general idea can be made to work.)
The central idea is generating data from binary input, so it's natural to express the core as a monad with a single operation that reads and returns one bit, like this:
Then everything else can be built on top of the monad operations plus
bit
, and so lives outside of the core. (I think it's possible to implement this interface fairly efficiently, but it's possible that operations that read multiple bits at a go might also be useful.)The
stream
here is intended to be functional, where reading an element doesn't change the state of the stream, rather than imperative/OCaml-style, whereStream.next
returns a new element each time. Here's a suitable definition in the even style:Then
run
can be invoked with some source of bits -- either a PRNG for quickcheck-style checking, or a bit vector that is iteratively perturbed by the afl framework for fuzzing. (Again, since the source of bits is abstracted, it's straightforward to add alternative sources later, even outside the library.)The combination of the monad operations and
bit
should be sufficiently expressive to define more building blocks, e.g.