Supporting OTT in Lean - Githubissues

leodemoura commented 9 years ago

After we added quotient types, we lost a nice property: every closed term evaluates to a value. We have "stuck" terms such as eq.rec H1 (quot.sound H2) that do not evaluate to a value. Here is a real example demonstrating this issue: https://gist.github.com/leodemoura/fadcc0b6a916e76510de

Another issue is how to justify propositional erasure. For example, suppose we want to implement an efficient evaluator for closed terms that erases propositions. This evaluator would manage to reduce the term @eq.rec (finset nat) s₁ (λ x, num) 0 s₂ dummy_eq (in the previous link) to 0. Now, assume we want to use the fast evaluator to perform proofs by reflection. It feels weird that the fast evaluator would be able to "compute" more than the standard one. Moreover, the fast evaluator allows us to type check terms that do not type check with the default evaluator.

Note that equality is not the only proposition that can produce "stuck" terms. In general, any proposition that can eliminate into Type can produce "stuck" terms (i.e., the ones that support singleton elimination) Equality is one of them, eq.rec (Lean's J rule) allows us to eliminate into Type. Another example is the accessibility predicate acc used to define well-founded recursion (https://github.com/leanprover/lean/blob/master/library/init/wf.lean#L9).

OTT seems a natural choice for addressing (some of) these problems. If we manage to implement it, we would have a solid story for justifying propositional erasure, and we would get a lot of nice properties such as a closed term of type nat always evaluate to a value (i.e., 0 or succ^k 0).

As far as I understand, the main points of OTT are: Prop is irrelevant (we already have that in Lean); (heterogeneous) equality is a bultin type; add equality reduction rules; replace J rule with a coe (coercion operator)

   coe {A : Type} {B : Type} : A -> A = B -> B

and have coe reduction rules that "compute" independently of the proof term for "A = B". I think this last item is the main insight of OTT. In the OTT paper, they provide coe reduction rules for a fixed set of type formers: Sigma, W, Pi, ...

Now, the first problem: how to support user-defined inductive datatypes? More precisely, given a user-defined inductive datatype, how do we automatically generate the coe reduction rules. For simple types such as list A, everything looks natural. We would have the following equation and coe reduction rules:

list A = list B  ===>  A = B
(cons A ha ta : list A) = (cons B hb tb : list B)  ===>  (ha : A) = (hb : B) /\ (ta : list A) = (tb : list B)
coe (list A) (list B) H (cons A ha ta) ===>  cons B (coe A B H ha) (coe (list A) (list B) H hb)

Note that the first reduction rule is saying list is injective, and this property is used to define the coe reduction rule. That is, the term coe A B H ha is only type correct because list A = list B reduces to A = B. At first sight, it seems natural to assume that inductive types are injective. However, in general, this is unsound. The following link has a counterexample: https://gist.github.com/leodemoura/0c88341bb585bf9a72e6 In this link, we derive false by assuming that an inductive datatype that has a "unused parameter" is injective. This issue has been discussed in the Agda mailing list (https://lists.chalmers.se/pipermail/agda/2010/001530.html). There is a similar thread in the Coq mailing list (http://comments.gmane.org/gmane.science.mathematics.logic.coq.club/4348). So, the issue is: can we have a simple syntactic check for establishing when is safe to assume that an inductive datatype is injective? The check must cover all useful inductive datatypes. By "useful", I mean it should cover at least the ones used in the Lean standard library. BTW, it seems Agda is now rejecting any inductive datatype declaration that contains unused arguments. However, I did not find any rigorous treatment of this issue (when it is safe to assume inductive datatypes are injective).

Second problem: indexed families. In Lean (like most other systems), we support indexed families. It is not clear how to derive the coe-reduction rules for indexed families such as:

inductive Lam : list var -> Type :=
...
| app : forall s1 s2 : list var, Lam s1 -> Lam s2 -> Lam (s1 ++ s2)
...

The problem here is that ++ is not injective. So, I don't see how we can define the coercion reduction rule for app. That is, given an equality (H: Lam (s1 ++ s2) = Lam (s3 ++ s4)), to move the coercion inside of the app constructor

    coe (Lam (s1++s2)) (Lam (s3++s4)) H (app s1 s2 l1 l2)

we need (H1 : Lam s1 = Lam s3) and (H2 : Lam s2 = Lam s4), but these two equalities do not follow from H :-( So, as far as I understand, this kind of inductive family cannot be supported in OTT.

Another critical issue is how to support the accessibility predicate acc. Its eliminator acc.rec also eliminates into Type. One option is to assume that acc is also a builtin operator, and make sure that acc.rec F H "computes" independently of the structure of (H : acc R x). That is, H may not reduce to acc.intro ... (the constructor of acc.rec). One possible trick is to observe that H : acc R x is definitionally equal (by proof irrelevance) to

acc.intro x (fun (y : A) (H2 : R y x), acc.inv H H2)      :    acc R x

So, whenever the evaluator is stuck at

    acc.rec F H

we can simply replace H with acc.intro x (fun (y : A) (H2 : R y x), acc.inv H H2, and the computation will continue. We cannot use this trick on open terms because the type checker could loop even when we restrict the trick to the case where F is a lambda (e.g., the type checker could loop in a context containing false assumptions). So, our idea is to restrict the trick above to closed terms. In this way, at least, we can claim that closed terms do not get "stuck". We still need a more careful analysis of this issue.

Finally, we would have to eliminate "singleton elimination" from the Lean kernel.

avigad commented 9 years ago

This is an excellent summary, and extremely helpful.

It seems to me that singleton elimination lies at the root of the problem. The idea behind Prop is to separate data we care about from things that are computationally irrelevant, so we can ignore the irrelevant parts when computing. But when terms contain eq.rec or acc.rec, things that should be computationally irrelevant become relevant, because the computation can't continue without computing them.

Is that a reasonable description of the issue? It still makes me uncomfortable to invest a lot of effort to compute things that are computationally irrelevant. My gut feeling is that if there is any value to working in a world with Prop, it is that it helps us to be clear about what we want to compute and what we don't.

Maybe an alternative strategy is to give up on fast computation in Lean for terms that use eq.rec. We can still use dependent type theory to reason about non-dependent algorithms, and even reason about algorithms that use eq.rec. But to run the latter efficiently in Lean, we would have to reflect / translate / compile them down to algorithms that do not "use" eq.rec in the computation.

leodemoura commented 9 years ago

Is that a reasonable description of the issue?

Yes, I fully agree.

leodemoura commented 9 years ago

Maybe an alternative strategy is to give up on fast computation in Lean for terms that use eq.rec

I think both eq.rec and acc.rec are extremely useful. We would have to give up fast computation for any term using "casts" and/or "well-founded recursion" :-(

avigad commented 9 years ago

I know. Casts come up, for example, in the dependent coding of vec A n. But if we encode vec A n as lists with length n, it seems to me that we can design our algorithms in ways that avoid casts.

Similarly, there are ways defining well-founded recursions that don't require singleton elimination for acc.rec. In the past, we defined div and gcd in that way.

Maybe these are less slick from a functional programming perspective, but to me, it seems a lot more natural than computing with elements of Prop.

leodemoura commented 9 years ago

But if we encode vec A n as lists with length n, it seems to me that we can design our algorithms in ways that avoid casts.

We would still need casts. BTW, I tried to write vec A n using the subtype approach. It is not as nice (https://github.com/leanprover/lean/blob/master/library/data/vec.lean). We can try to make it nicer with better automation.

Similarly, there are ways defining well-founded recursions that don't require singleton elimination for acc.rec. In the past, we defined div and gcd in that way.

They are less powerful and less convenient to use. I don't see how we could define find_x (https://github.com/leanprover/lean/blob/master/library/data/nat/choose.lean) without using acc.rec. In the approach we used in the past for div and gcd we had to provide a measure. That is, how many "recursive" calls we are allowed to make. It is like estimating the amount of "fuel" we need to make sure we can compute the result. For div and gcd, it is easy to compute how much "fuel" we need. This is not the case for find_x.

leodemoura commented 9 years ago

Maybe these are less slick from a functional programming perspective, but to me, it seems a lot more natural than computing with elements of Prop.

This is why I was excited about OTT. In OTT, we don't have to compute with elements of Prop. coe computes independently of the structure of the proof-term. The trick for acc.rec is similar. acc.rec would also compute independently of the structure of proof-terms.

avigad commented 9 years ago

These points are all well taken. But let me just say that find_x is exactly the sort of thing I would prefer to avoid. I don't want to compute a natural number by proving it exists and then searching for it. That seems like a bad way to write programs. The natural way to compute a natural number is to describe an algorithm to compute it.

leodemoura commented 9 years ago

find_x is just an example. It shows we can write a recursive definition when we have a proposition that guarantees that it will terminate. The same approach can be used to define other functions.

BTW, if we have the acc.rec trick described above, we would also be able to use even classical reasoning to justify that a recursive function terminates. acc.rec would not get stuck.

leodemoura commented 9 years ago

Here is extra motivation for supporting OTT in Lean. Consider the type class described at #650

inductive finite_set [class] (xs : set T) := 
| mk : ∀ (fxs : finset T), to_set fxs = xs → finite_set xs

A set T is used as a parameter for the type class. Now, given h : s2 = s1, fxs1 : finite_set s1, a term such as

    (eq.rec_on h fxs1) : finite_set s2

will appear in formalizations using this type class. This term is performing a "cast" using an equality between sets. Since, most equalities between sets are derived using funext (and consequently quot.sound), the term above will get stuck during evaluation. The term above may occur in a term such as

@card A s2 (eq.rec_on h fxs1)

This is not a unreasonable term, it computes the size of s2 using fxs1 : finite_set s1 and a proof h that s2 = s1. However, even if this term is closed, the evaluation will get stuck reducing eq.rec_on h fxs1 at

eq.rec fxs1 (quot sound ...)

leodemoura commented 9 years ago

Additional remark: just checking if all parameters of an inductive datatype declaration have been used is not sufficient to prevent the inconsistency derived at https://gist.github.com/leodemoura/fadcc0b6a916e76510de

We can still derive false by "exploiting" the fact that Prop is impredicative:

open eq

inductive I (F : Type₁ → Prop) : Type₁ :=
| mk : (∀ (A : Type₁) (p : F A), p = p) → I F
-- The declaration above is accepted because
-- (∀ (A : Type₁) (p : F A), p = p) is a proposition,
-- and "lives" in Prop (i.e., Type.{0})

axiom InjI : ∀ {x y}, I x = I y → x = y

definition P (x : Type₁) : Prop := ∃ a, I a = x ∧ (a x → false).

definition p := I P

lemma false_of_Pp (H : P p) : false :=
obtain a Ha0 Ha1, from H,
  subst (InjI Ha0) Ha1 H

lemma contradiction : false :=
have Pp : P p, from exists.intro P (and.intro rfl false_of_Pp),
false_of_Pp Pp

leodemoura commented 9 years ago

@codyroux suggested the following syntactic restriction: all parameter and index types appear negatively in the type of one constructor (i.e., positively in the type of one of the arguments). This restriction covers list A, vector A, tree A, etc. However, we would not cover a functor-like type

inductive F (A B : Type) :=
| mk : (A -> B) -> F A B

fpvandoorn commented 9 years ago

@codyroux suggested the following syntactic restriction: all parameter and index types appear negatively in the type of one constructor (i.e., positively in the type of one of the arguments).

I might misunderstand something, but how does that help? We can easily modify your contradiction proof such that the parameter F appears negatively in the constructors (since the constructor is not used at all in the proof). For example:

inductive I (F : Type₁ → Prop) : Type₁ :=
| mk : F unit → I F

leodemoura commented 9 years ago

@fpvandoorn You are right, it doesn't help.

I was thinking about the following alternative. Instead of relying on inductvie datatype injectivity, we derive the equations we need from the constructors. For example, given I F_1 = I F_2, and your declaration, we can deduce that F_1 unit = F_2 unit. It seems more compatible with a set-theoretic interpretation where types are interpreted by sets, and the type I is being generated by F unit instead of F. Moreover, the equality F_1 unit = F_2 unit is the one we need to implement the coercion reduction rule for I.mk.

I think we just need to be careful with empty types. For example, given

inductive I (A : Type) : Type :=
| mk : A -> false -> I A

and I A_1 = I A_2, there is no reason to believe that A_1 = A_2.

I think we can avoid this issue by guarding the generated equalities with witness for the argument types. Here is another example, suppose we have

inductive I (a : A) :=
| mk : Pi (b : B a) (c : C a b), I a

Then, we get the following axioms

I.mk.eq.1 {a_1 : A} {a_2 : A} (b : B a_1) (c : C a_1 b) (e : I a_1 = I a_2)
        : B a_1 = B a_2
I.mk.eq.2 {a_1 : A} {a_2 : A} (b : B a_1) (c : C a_1 b) (e : I a_1 = I a_2) 
        : C a_1 b = C a_2 (coe (I.mk.eq.1 b c e) b)

With them, we can generate the following coercion-reduction rule

-- Given {a_1 : A} {a_2 : A} (b : B a_1) (c : C a_1 b) (e : I a_1 = I a_2) 
coe e (@mk a_1 b c) ==> @mk a_2 (coe (I.mk.eq.1 b c e) b) (coe (I.mk.eq.2 b c e) c)

spitters commented 9 years ago

@avigad find_x is sometimes seen as a form of Markov's principle. One then thinks of Prop as the double-negated propositions. Of course, find_x can be defined (in HoTT) by using the axiom of description. I am not sure whether this solves anything, but it may give a slightly different perspective.

spitters commented 9 years ago

@leodemoura Thanks for the clear description of the problem. Off hand, I don't see a solution, but there is a bit of development in cubical they may help. Once I understand it, I'll let you know.

avigad commented 9 years ago

@spitters I know find_x is a form of Markov's principle. My point is that Markov's principle does not make for good programs, and I would not mind giving it up.

@leodemoura's point is well taken, though, that it is generally inconvenient to have to provide an "external" measure.

As far as the injectivity paradoxes, it seems to me that they are all variants of the proof that given any set S, there is no injection from the power set of S to S.

I think it is possible to make sense of the idea of interpreting types as "labelled sets", so that inductive types are labelled by the sets the parametrize them. The paradox shows that if the types of the parameters are all in Type_{n}, we have to put the inductive type in Type.{n}, to make sure we have enough labels. For example, if one of the parameters is F : Type.{1} -> Prop, we have to put the inductive type in Type.{2}, because Type.{1} -> Prop is in Type.{2}.

I think one can identify special cases where we do not have to go up a type. For example, for parameters A : Type.{1}, or A : B -> Type.{1} with B : Type.{1}, there should be enough labels in Type.{1}. That will let us keep list A in Type.{n} when A is in Type.{n}.

In short, for types without indices, I think we can avoid the injectivity paradoxes just by increasing the type level of some inductive types.

Do you think this is good enough?

leodemoura commented 9 years ago

Do you think this is good enough?

It depends on the special cases we can identify. We don't want to bump the universe level of list, sigma, vector, sum, tree, prod, ... As you wrote above, we want list A and list (list A) to be both in Type.{n} when A is Type.{n}. Similarly, we want nat x nat x nat x nat to be in Type.{1} instead of Type.{4}. For example, the following definition would not work if we bump universe levels.

open nat
definition tuple (A : Type) : nat → Type 
| 0     := unit
| (n+1) := A × tuple n

leodemoura commented 9 years ago

The trick of replacing H in acc.rec F H with

acc.intro x (fun (y : A) (H2 : R y x), acc.inv H H2)

also addresses issue #722. However, as described above, the convertability checker may loop.

leodemoura commented 9 years ago

In the comments above, we have that the following stuck term

@card A s2 (eq.rec_on h fxs1)

We can use the following trick to avoid that. First, we define the following auxiliary "cast"-like function.

definition finite_set_cast {A : Type} {s₁ s₂ : set A} 
           : s₁ = s₂ → finite_set s₁ → finite_set s₂
| h (finite_set.mk fs eqv) := finite_set.mk fs (eq.rec_on h eqv)

Now, consider the following example

definition s₁  : set nat := λ a, a ∈ [1, 2, 3]
definition s₂  : set nat := λ a, a ∈ [3, 2, 1]
lemma s₁_eq_s₂ : s₁ = s₂ := ...
definition fs₁ : finite_set s₁ := ...

eval @card nat s₂ (eq.rec_on s₁_eq_s₂ fs₁)
eval @card nat s₂ (finite_set_cast s₁_eq_s₂ fs₁)

The first eval gets stuck as discussed in the previous comments. However, the one using the finite_set_cast custom "cast" function doesn't.

avigad commented 9 years ago

This is a nice idea: you change the proof of correctness, but you leave the data alone.

The trick seems general. For example, you can use it with the class differentiable f x discussed in this thread:

https://groups.google.com/forum/#!topic/lean-discuss/yiIU2cG93v0

Given H : x = y and even H' : forall x, f x = g x, we can cast an element of differentiable f x to an element of differentiable g y without blocking computation.

leodemoura commented 8 years ago

I'm closing this one. It has been inactive for several months. My current position is: OTT would marginally improve the user experience.

leanprover / lean3

Supporting OTT in Lean #654