leanprover / lean4

Lean 4 programming language and theorem prover
https://lean-lang.org
Apache License 2.0
4.55k stars 404 forks source link

RFC: well-founded recursion: automatic `.attach` insertion #5471

Open nomeata opened 1 week ago

nomeata commented 1 week ago

Problem description

Users are often confused and annoyed that code like this does not work:

inductive Tree where | node : List Tree → Tree
def Tree.rev : Tree → Tree
  | node ts => .node (ts.reverse.map (fun t => t.rev))

Instead, one has to write

inductive Tree where | node : List Tree → Tree
def Tree.rev : Tree → Tree
  | node ts => .node (ts.attach.reverse.map (fun ⟨t, _⟩ => t.rev))

which

Goal

It would be good to address these two issues: Automatically introduce .attach etc. where needed, and then hide it again in the equational lemmas (and in the induction principle).

Precedence

Something similar is already done by Lean: Any if … then … else … written by the user is turned into an if h : … then … else …, so that the condition is available to decreasing_by tactic.

Solution sketch

My conjecture is that we can achieve something like this by way of rewriting, which allows us to lean on the simp machinery (possibly extended by simpprocs where needed, e.g. higher-order matching). This has the benefit of being extensible – clearly lean shouldn’t have hardcoded knowledge about how to thread .attach’ed information through List.map and Array.foldl.

There are three phases

  1. Marking the parameters
  2. Propagating parameter information as wide as possible
  3. Removing left-over markers
  4. Cleaning up (in proof goals, in equational theorems, in the functional induction principle).

Marking the parameters

We’d define a helper identity function

def isParam {α : Sort u} : α → α

and when preprocessing the PreDefinition, replace all occurrences of a parameter x with isParam x.

Propagating the information

This is the interesting phase, built on these rewrites:

  1. Propagate through match:

    If we find

    match isParam x with | .con y z => … y … z …`

    replace with

    match x with | .con y z => … isParam y … isParam z …`

    so that we propagate the information about “this is interesting”

    (And to achieve confluence with the next rule, match xs.attach.unattach with … should behave the same as match isParam xs with.)

  2. Introduce attach:

    If we have

    isParam x

    and x is of an attachable type (List, Array, Option, hopefully extensible), replace it with x.attach.unattach (where List.unattach is map Subtype.fst).

  3. Propagate .attach:

    An extensible set of rewrite rules propagates .attach information, by matching on .unattach. For example simple things like

    xs.unattach.reverse     = xs.reverse.unattach

    but (more interestingly) higher-order rules like

    xs.unattach.map f       = xs.map (fun <x,h> => f (isParam x))
    xs.unattach.filter f    = (xs.filter (fun <x,h> => f (isParam x))).unattach
    xs.unattach.foldl f acc = xs.unattach.foldl (fun a <x,h> => f a (isParam x)) acc 

    and where if f x is a manifest redex, it is reduced (so that the proof h actually becomes visible).

    Notice that where possible, the .unattach is preserved – maybe the code is a chain of (xs.reverse.filter (…)).map f… and the recursive call is in the parameter to map; we want to propagate the information there.

    Also notice that we mark the arguments in the higher-order function argument with isParam, this way nested Lists are correctly attach-annotated.

Removing left-over markers

The last Predefinition preprocessing step is simply isParam x = x and xs.attach.unattach = xs, as the markers have done their job.

Cleanup

The three steps above hopefully put the right attach-like information into the context, and will allow Lean to define the function with well-founded recursion. But the user shouldn’t see all that, so we need to clean up. Essentially, this amounts to reversing the rewrite rules above:

xs.attach.unattach                      = xs
xs.reverse.unattach                     = xs.unattach.reverse
xs.map (fun <x,h> => f x)               = xs.unattach.map (fun x => f x)
(xs.filter (fun <x,h> => f x)).unattach = xs.unattach.filter (fun x => f x)

The higher-order rewrite rules among these are not good simp lemmas, the simplifier currently does not understand “apply this rule if the argument to map does not use the second component”. So either simp needs to be strengthened to understand at such higher-order patterns, or a cusotm simpproc needs to handle these.

Using this cleanup stage on the termination proof goals, before proving the equational lemmas and on the type of the functional induction theorem should, at least in most cases, hide all this from the user.

Variations

Open questions

This adds a yet another fair amount of complexity to the compilation of recursive functions. How reliable will it be? If it only works in simple common cases, but increases the mental load for users who have to do things beyond that, is it worth it?

Maybe it suffices to let the user introduce .attach where needed, but clean up the equational lemmas to avoid mentioning it where possible?

Or even more simple: If the default simp could clean up the usual idioms (by improving it's support for this kind of higher-order patterns), we might improve the user experience without introducing much additional complexity.

Community Feedback

Impact

Add :+1: to issues you consider important. If others benefit from the changes in this proposal being added, please ask them to add :+1: to it.

kim-em commented 6 days ago

5477 hopefully fulfils the cleanup requirements.

kim-em commented 3 days ago

but @nomeata's solution in https://github.com/leanprover/lean4/pull/5479 looks even better!

nomeata commented 2 days ago

There is a problem with this plan: The rewrites done by the processing do not preserve definitional equality. This means that often the rewriting will fail. For example rewriting

(xs.map (fun x => f x))[3]'h

to

(xs.attach.map (fun <x,_> => f x))[3]'h

is likely bad, because the proof in h may still mention xs.map …. Maybe if the translation also translate the proof h it will work, but we at least need to be prepared to fail gracefully if the types are too dependent.

eric-wieser commented 18 hours ago

How bad do things become if map is burninated in favor of using pmap everywhere? Maybe the lambda taken by pmap could take it's h argument implicitly, so that pmap fun x => x^2 doesn't have to be written pmap fun x _ => x^2?

nomeata commented 18 hours ago

You mean as part of the translation, or completely everywhere in Lean?

eric-wieser commented 18 hours ago

The latter (letting what is now pmap be renamed to map once map is deleted)

nomeata commented 18 hours ago

I expect that that’s going to be quite horrible. For example it will make it hard for simp to rewrite l in List.map l f, because it may make f no longer typecorrect without casts.

eric-wieser commented 3 hours ago

On the other hand, it would be great to solve that problem anyway for the current List.pmap. My guess is that the way to handle that is to introduce the casts as List.map l' fun x hx => f x (h.mpr hx) or similar, and let the cast be automatically eliminated in the non-dependent case; but I've not thought about this in any detail!