leanprover / lean3

Lean Theorem Prover
http://leanprover.github.io/
Apache License 2.0
2.15k stars 217 forks source link

coinductive types #1441

Closed avigad closed 7 years ago

avigad commented 7 years ago

The question often arises as to whether Lean will have coinductive types. Nobody involved in the project is planning to implement them in the foreseeable future, but there is agreement that it would be a nice feature to have. This issue includes some notes and suggestions that might be helpful to anyone interested in implementing them.

"Adding coinductive types to Lean" can mean one of two things: extending Lean's kernel with new constructions and operations or building coinductive types on top of existing primitives. Lean has adopted the design decision of keeping the kernel as small and simple as possible; for example, it implements a minimal form of inductive data type with primitive recursors (à la Dybjer). Other forms of inductive types and recursion, such as nested and mutual inductive types, arbitrary structural recursion, well-founded recursion, etc. have to be "compiled down" to those. We would therefore expect coinductive types to be defined in Lean, without changes to the kernel.

Isabelle has implemented a powerful coinductive datatype package along those lines: https://people.mpi-inf.mpg.de/~jblanche/co-data-impl.pdf. There are a few things that make the project more complicated in Lean:

The first issue is important because we would like to generate bytecode from such definitions or compile them to native code. Compilers can recognize coinductive definitions and handle them in special ways, but constructive definitions provide a computational semantics, thereby specifying intended behavior.

Ken Sakayori (@sakas--), an undergraduate at the University of Tokyo at the time, visited Carnegie Mellon for three weeks in the summer of 2015. Together we worked out a proposal for implementing coinductive types in Lean, and Ken carried out a proof-of-concept example by hand. Some notes I wrote up afterwards and Ken's formalization (both with snippets of Lean 2 code) are found below.

Implementing this in Lean 3 would require a lot of work. Someone would have to:

The first is not so hard. It may soon even be possible to write the parser extension in Lean itself. But the rest are difficult. Building expressions of dependent type theory automatically is fiddly and subtle, and there will likely be many unforseen corner cases. (It should be possible to carry out all or at least substantial parts of the constructions in Lean itself, with the metaprogramming facilities and API.) Users will expect the same gadgets that are being developed for inductive types (like mutual-nested-dependent-inductive-coinductive constructions and recursion). And all the work will be for nothing unless someone is able to keep the package going and respond to issues, questions, bug reports, and so on.

That said, I will now reproduce the notes on the construction that Ken and I proposed, and add some comments from Leo at the end. Ken has made his Lean 2 code available here: https://gist.github.com/sakas--/2758e8ebfc91db45526c. The notes below also use Lean 2 syntax.

To illustrate the idea, let us consider an inductive type with three constructors:

inductive two_tree (A : Type) : Type :=
| nil {}      : two_tree A
| cons        : A → two_tree A → two_tree A
| cons₂       : A → two_tree → two_tree A →two_tree A

In other words, a two_tree is either a leaf, a labeled node with one child, or a labeled node with two children. We chose this example because it is just complicated enough to stand for a general case.

The coinductive version, a lazy two tree would be similar, except that some branches might be infinite. The idea is to model this as a sequence of finite approximations. For every n, we have a depth n approximation, in which any branch that has not terminated yet is marked continue.

To do this, we only have to add an index to the definition above, and a constructor for continue:

inductive l_two_tree_approx (A : Type) : ℕ → Type :=
| continue {} : l_two_tree_approx A 0 
| nil {}      : Π {n : ℕ}, l_two_tree_approx A (n + 1)
| cons        : Π {n : ℕ}, A → l_two_tree_approx A n → l_two_tree_approx A (n + 1)
| cons₂       : Π {n : ℕ}, A → l_two_tree_approx A n → l_two_tree_approx A n →
                           l_two_tree_approx A (n + 1)

We want to say that a lazy two tree is a sequence of approximations, one for each n, such that the nth approximation agrees with the (n+1)st. Here is what it means for consecutive approximations to agree:

definition agrees : Π {n : ℕ}, l_two_tree_approx A n → l_two_tree_approx A (n + 1) → Prop
| 0 continue          _                  := true
| _ nil               nil                := true
| _ nil               _                  := false
| _ (cons a t₁)       (cons a t₂)        := agrees t₁ t₂
| _ (cons a t₁)       _                  := false
| _ (cons₂ a t₁₁ t₁₂)  (cons₂ a t₂₁ t₂₂) := agrees t₁₁ t₂₁ ∧ agrees t₁₂ t₂₂
| _ (cons₂ a₁ t₁ t₁') _                  := false

We can then define a lazy two tree to be a sequence of approximations, such that each agrees with the next:

definition l_two_tree (A : Type) : Type := 
{s : Π (n : ℕ), l_two_tree_approx A n | ∀ n, l_two_tree_approx.agrees (s n) (s (n + 1))}

That is really the core idea. The definition of agrees makes sure there is progress, i.e. if (agrees t1 t2) holds, then t2 refines any continue left in t1. We can then go on to define predicates is_nil is_cons is_cons₂ and functions head tail head₂ tail₂₁ tail₂₂. Ken worked this all out in the gist. It is tedious, but it is entirely formulaic: you define the operations on approximations, you show that they preserve agreement, and then you list the definition to sequences.

What about corecursion? The idea is that given f : 1 + A * X + A * X * X we should be able to define a function corec f satisfying

  corec f x := 
    nil                                if f x is is in the first component
    cons a (corec f x')                if f x returns (a, x') in the second component
    cons₂ a (corec f x₁) (corec f x₂)  if f x returns (a, x₁, x₂) in the third component 

We can write this in a nice way be defining

inductive ltwo_tree_shape (X A : Type) := 
| inl {} : ltwo_tree_shape X A
| inm {} : A → X → ltwo_tree_shape X A
| inr {} : A → X → X → ltwo_tree_shape X A

local notation `shape` := ltwo_tree_shape

and then defining

definition corec (f : X → shape X A) (x : X) : l_two_tree A := ...

Ken did all this, and got it to work. He proved, for example,

theorem head_corec {f : X → shape X A} {x : X} (H : is_cons (corec f x)) : head H = pr₁₁ (is_inm_of_is_cons_corec H)

This is a definitional reduction rule in Coq; we only have it propositionally. The other reduction rules should be similar (though tedious) to prove.

I think this solution is very general. Here are two things I am not yet sure of:

  1. Can we do mutual coinduction / induction, as in Isabelle?
  2. Can corec simulate all of the guarded recursions that are allowed in Coq, or, at least, can the method be extended to handle them? I think the answer to both questions is "yes".

For this example, Ken had to write almost 600 lines of Lean code (and it is not quite finished). But it is mostly boilerplate, and will hopefully be automatable. One nice thing is that it does not seem to require a lot of new syntax. Instead of

inductive two_tree (A : Type) : Type :=
| nil {}      : two_tree A
| cons        : A → two_tree A → two_tree A
| cons₂       : A → two_tree → two_tree A →two_tree A

we want to write

coinductive l_two_tree (A : Type) : Type :=
| nil {}      : l_two_tree A
| cons        : A → l_two_tree A → l_two_tree A
| cons₂       : A → l_two_tree → l_two_tree A →l_two_tree A

and have the Lean generate all the stuff for us.

There is a small wrinkle, in that we have to generate constants is_nil is_cons is_cons₂ head tail head₂ tail₂₁ tail₂₂. The first three can be taken from the names of the constructors, but users need some way of specifying the others. Isabelle has you do it something like this:

coinductive l_two_tree (A : Type) : Type :=
| nil {} : l_two_tree A
| cons  (head : A) (tail : l_two_tree A) : l_two_tree A
| cons₂ (head₂ : A) (tail₂₁ : l_two_tree) (tail₂₂ : l_two_tree A) : l_two_tree A

Lean 3 now supports such syntax for inductive types.

Another issue is to come up with nicer syntax for writing corecursive definitions. A full-blown effort will require extending the function definition system. But even a no-frills solution, writing

  f = corec (λ x, ...)

where ... is any expression of type shape X A is not too bad. It is intuitive: shape X A just specifies which constructor you want to use, in our case, nil, cons, or cons₂. It seems no more difficult than using the rec primitives for inductive types in Lean.

After reading these notes, Leo offered the following suggestions.

It is really useful to devise schematic proofs (aka templates) that work for any coinductive declaration. To design this kind of template, it is useful to try many examples by hand. At least, this is the approach I used when I implemented features such as the brec_on definition.

Here are some suggestions:

  1. Try encoding a coinductive indexed family. Indexed families usually create technical problems. I think it will be helpful to understand how they impact the proof templates.
  2. Coinductive predicate. I usually have to write custom code for inductive predicates.
  3. Reflexive coinductive types. We say an inductive type T is reflexive if it contains at least one constructor that takes as an argument a function returning T. Example:

    coinductive inftree (A : Type) : Type :=
    | leaf : A → inftree A
    | node : (nat → inftree A) → inftree A

    It is also useful to try to simplify the proofs and get them close to a general template as possible. Here is an example where I tried to simplify some of the definitions/theorems in Ken’s file.

    https://gist.github.com/leodemoura/fa65c20b50a391456d32

johoelzl commented 7 years ago

Stopgap measure for coinductive predicates

I just added a small theory algebra/lattice/fixed_points to library_dev. This could provide a stopgap measure to define coinductive predicates. Again this would copy what Isabelle does.

It works as follows:

coinductive_predicate P : a -> Prop
| q_intro : forall a, q a -> P (t a)
| r_intro : forall a, r a -> P a -> P (s a)

would define a function P' as P' F a = (exists a', a = t a' /\ q a') \/ (exists a', a = s a' /\ r a' /\ F a')

The proof that P' is monotone is straight forward and easy to automate. Then we define P = gfp P' and with the lemma gfp_unfold we derive P = P' P and with this the required introduction rules. gfp_induct provides us the induction rule.

For predicates we do not need a corecursor. And as we are in Prop definitional equality is not a problem. Up to now the fixed point theory is constructive, I didn't try yet a concrete example but I guess the entire construction can be done constructively.

I have already a simple coinduction tactic (but nothing comparable to the equations compiler).

leodemoura commented 7 years ago

Adding coinductive datatypes to Lean is a major and complex project. It is unrealistic to expect an external contributor will be able to do it. Just to put things in perspective, the inductive compiler implements two transformations: it eliminates mutual and nested inductives. Its abstract description is much simpler than the one described here. Now, check the actual implementation: https://github.com/leanprover/lean/tree/master/src/library/inductive_compiler When @dselsam and I discussed it on the white board, everything looked simple, but @dselsam had to solve many unforeseen issues in the actual implementation.

avigad commented 7 years ago

Understood. Feel free to close the issue. My main goal was to make the notes publicly available in the issues, because I was asked to do so.

leodemoura commented 7 years ago

@avigad Github has a wiki feature. We can put this kind of note there.

avigad commented 7 years ago

I just added it to the wiki. Feel free to edit it. I will close the issue here.