stedolan / malfunction

Malfunctional Programming
Other
338 stars 19 forks source link

Lazy recursive values #39

Closed mattam82 closed 4 months ago

mattam82 commented 8 months ago

The current check implemented to verify that recursive bindings are correct is, I think, too simplistic. It currently only accepts immediate functions or lazy, already at parsing time: https://github.com/stedolan/malfunction/blob/4227093923bfa917565c5194c9ebd586415bdadf/src/malfunction_parser.ml#L110

According to https://v2.ocaml.org/manual/letrecvalues.html, many more programs can be accepted on the right hand-side of a let rec. This causes a problem in particular for our extraction from Coq, where a typical pattern of definition can be:

CoInductive stream := Cons { head : nat; tail : stream }.

CoFixpoint ones : stream := 
  let x := bla in {| head := x; tail := ones |}.

This gets translated to (in pseudo malfunction syntax): let rec ones = let x = bla in lazy (Cons x ones)

and is rejected. As Coq supports a kind of "positive" view of coinductives, and only optionally, by user choice, co-pattern matching, some cofixpoint definitions in the wild might actually compute a few things before calling the constructor. Of course any cofixpoint with at least one argument will be fine as it starts with a lambda.

But currently malfunction will also reject let rec ones_list = 1 :: ones_list which is accepted by OCaml. Is there any rationale for this? As for our application in Coq, this does not matter as one cannot produce such cyclic values from Coq: every let rec has at least one argument, except for the images of cofixpoints, but then the body's head must be a constructor.

stedolan commented 7 months ago

When Malfunction was originally designed, the exact set of what OCaml accepted in let rec was underspecified, and the implementation was unsound in various ways. Later, many of the issues were repaired with a elaborate specification of guardedness (see A Practical Mode System for Recursive Definitions, POPL 2021), but various bugs remained in the compilation scheme, causing miscompilation and segfaults even on definitions that pass the guardedness check. I believe these have now finally been fixed, but the first version of OCaml with all the fixes is the unreleased 5.2.0.

So, Malfunction went with a narrow spec here, limited to a class of definitions that's (a) easy to specify and (b) never miscompiled by recent versions of OCaml. We could extend this to a larger class in principle, but only if there's a simple specification of what that class is, and we're confident everything in that class is compiled correctly. In practice, I suspect it may be easier to do this manually, by implementing the patching necessary to construct cyclic values using explicit mutation.

stedolan commented 7 months ago

Incidentally, in your example (and for cofixpoints in general), why should code before the constructor be computed before the lazy value is forced? That is, why not compile the cofixpoint like:

let rec ones = lazy (let x = bla in Cons x ones)

instead of

let rec ones = let x = bla in lazy (Cons x ones)

The former fits the pattern that Malfunction can already compile.

In fact, it seems that Coq's extraction sometimes breaks due to the recursive values guardedness check (although possibly this is fixed, I tested with Coq 8.11 since that's what's on this machine):

CoFixpoint ones : stream := 
  let y := (1, ones) in
  let (p, q) := y in
  {| head := p; tail := q |}.

Extraction ones.

yields:

let rec ones =
  let y = Pair ((S O), ones) in let Pair (p, q) = y in lazy (Cons (p, q))

which fails to compile in OCaml with:

Error: This kind of expression is not allowed as right-hand side of `let rec'
stedolan commented 7 months ago

There's definitely something wrong with cofixpoint extraction, still in 8.19. Here's a term that's definitionally equal to 1, but whose extraction to OCaml diverges, eventually failing with a stack overflow:

CoInductive stream := Cons { head : nat; tail : stream }.

CoFixpoint repeat n :=
  let tail := repeat n in
  {| head := n; tail := tail |}.

Definition one := head (repeat 1).

repeat is extracted to let repeat n = let tail = repeat n in lazy (Cons (n, tail)), which passes OCaml's guardedness check (since it's a function) but diverges since the recursive call is outside of the lazy.

mattam82 commented 7 months ago

Oh great, a new bug in extraction :) Thanks for investigating!

mattam82 commented 7 months ago

This is another reason one should rather use co-pattern-matching notation to avoid this (e.g. every cofixpoint body would syntactically start with a constructor application). Coq's productivity checker allows to let-reduce the tail let-binding here, and rather check the productivity of the simplified term. This is fine for reduction inside Coq, but there's a mismatch with OCaml's evaluation.

mattam82 commented 7 months ago

We could and should indeed put the lazy in front of the cofixpoint body

mattam82 commented 7 months ago

When Malfunction was originally designed, the exact set of what OCaml accepted in let rec was underspecified, and the implementation was unsound in various ways. Later, many of the issues were repaired with a elaborate specification of guardedness (see A Practical Mode System for Recursive Definitions, POPL 2021), but various bugs remained in the compilation scheme, causing miscompilation and segfaults even on definitions that pass the guardedness check. I believe these have now finally been fixed, but the first version of OCaml with all the fixes is the unreleased 5.2.0.

So, Malfunction went with a narrow spec here, limited to a class of definitions that's (a) easy to specify and (b) never miscompiled by recent versions of OCaml. We could extend this to a larger class in principle, but only if there's a simple specification of what that class is, and we're confident everything in that class is compiled correctly. In practice, I suspect it may be easier to do this manually, by implementing the patching necessary to construct cyclic values using explicit mutation.

Thanks for the reference and explanations! I guess in our application we won't need this either, as we will fix our translation to put a lazy at the head of every fixpoint body coming from a cofixpoint.