short circuiting type class resolution

leanprover / lean3

Lean Theorem Prover

http://leanprover.github.io/

Apache License 2.0

2.15k stars 217 forks source link

short circuiting type class resolution #666

Closed leodemoura closed 9 years ago

leodemoura commented 9 years ago

Consider the following theorem

mul_sub_mul_div_mul_neg : ∀ {A : Type} [s : linear_ordered_field A] {a b c d : A},
  c ≠ 0 → d ≠ 0 → a / c < b / d → (a * d - b * c) / (c * d) < 0

The type looks small, but here is the current version in full detail. It is quite huge because it contains all the type class derivations. We can reduce it a lot of if we short-circuit them. Here is the short circuited version

This issue also affects definitions such as abs. It is also quite huge because it contains the type class derivations.

This issue has a big impact on performance. For example, any piece of automation that needs to use full higher-order unification is affected.

Recall that Georges suggested we add shortcuts to improve performance. For sure, it will improve performance. By short-circuiting some derivations in the file ordered_field.lean, I managed to reduce the runtime from 3 to 2 seconds on my machine (using a single core). However there are drawbacks. For example, we lose flexibility. The type class resolution procedure is quite flexible. The user can easily influence the instances being generate by adding local instances. That is, the instance derived in one context can be different from the instance derived in another context even if both are in the same environment. A local instance can be used in the middle of a derivation.

Remark: the short-circuiting would only be used for instances generated by the structure command. That is, we would generate the transitive closure of the child-parent structure graph.

Potential problem: we will be "poluting" the environment with many instances that are never used. In the worst case, a child-parent structure graph containing n structures may contain O(n*n) edges.

I also think we need a clean story for users. It should be clear when a short-circuit instance is created. Should we have an option for disabling the short-circuiting?

BTW, here are ideas I tried/considered and did not work 1- Dynamically generate the transitive closure. This is a mess because we can compile theorems in parallel. I tried to do it today, and gave up.

2- Don't short-circuit but annotate terms generated by type-class resolution. Then, use the annotation for disabling full higher-order unification for these terms. That is, whenever we have a flex-rigid unification constraint ?M t_1 ... t_k := instance which is a not a pattern, we pretend it is one. This works for the algebraic hierarchy, but it breaks files in HoTT library. The issue is that we use type class resolution for many different things, and in some applications we cannot approximate the solution for these flex-rigid constraints.

3- I have also considered a local macro. That is, given a big term T[a_1, ..., a_n] containing the variables a_1, ..., a_n, we would dynamically create a macro M_T such that M_T a_1 ... a_n expands to T[a_1, ..., a_n]. This doesn't work because the big term may contain universe variables, and we don't have any mechanism for abstracting universe variables. That is, only definitions and axioms have universe parameters. So, to be able to implement this approach, we would have to have macro declarations that also take universe terms as arguments.

fpvandoorn commented 9 years ago

The idea here is to automatically generate "shortcut" instances such as linear_ordered_field.to_has_lt when we define the structure linear_ordered_field, right? And make sure that these shortcut have a higher priority the bigger the shortcut.

The type class resolution procedure is quite flexible. The user can easily influence the instances being generate by adding local instances. That is, the instance derived in one context can be different from the instance derived in another context even if both are in the same environment. A local instance can be used in the middle of a derivation.

How do we lose/restrict this flexibility by these shortcut instances?

Remark: the short-circuiting would only be used for instances generated by the structure command. That is, we would generate the transitive closure of the child-parent structure graph. Potential problem: we will be "poluting" the environment with many instances that are never used. In the worst case, a child-parent structure graph containing n structures may contain O(n*n) edges. I also think we need a clean story for users. It should be clear when a short-circuit instance is created. Should we have an option for disabling the short-circuiting?

I think "the transitive closure of the child-parent structure graph" is a clear story to users. I agree that this pollution could be an issue, so an option to turn it off is probably good. However, I think in practice there will be very few instances which are never used, at least in the algebraic hierarchy (which - I think - is currently the only theory where the child-parent structure graph contains more than 4 members).

Another thing to think about is manual instances such as ordered_ring.to_ordered_semiring. Do we want to use such an instance in a shortcut, or is it okay if in these cases there is no direct shortcut?

leodemoura commented 9 years ago

The idea here is to automatically generate "shortcut" instances such as linear_ordered_field.to_has_lt when we define the structure linear_ordered_field, right?

Yes.

How do we lose/restrict this flexibility by these shortcut instances?

Suppose we have the classes A, B, C, D and E, the following instances:

f_1 : B -> A
f_2 : C -> B
f_3 : D -> C
f_4 : E -> D

Now, if we have an e : E and we need an A, type class resolution generates f_1 (f_2 (f_3 (f_4 e))). Similarly, if we need a B, we get f_2 (f_3 (f_4 e)). Now, suppose we define the instance

g : E -> C

Then, we get f_1 (f_2 (g e)) and f_2 (g e). We are affecting different type class resolution problems with a single new instance. If we create a shortcut from E to A and E to B, the new instance g will not affect the derivation.

Another confusing issue is that the instance f_1 (f_2 (f_3 (f_4 e))) is only generated if we have opened the namespaces where f_1, f_2, f_3 and f_4 were marked as instance generators. However, when we create the shortcut, we only need to open the namespace where the shortcut was created. Perhaps, this is a good thing.

However, I think in practice there will be very few instances which are never used, at least in the algebraic hierarchy (which - I think - is currently the only theory where the child-parent structure graph contains more than 4 members).

I agree.

Another thing to think about is manual instances such as ordered_ring.to_ordered_semiring. Do we want to use such an instance in a shortcut, or is it okay if in these cases there is no direct shortcut?

In principal, any instance that is like a coercion can be supported. By "like a coercion", I mean the type of the instance is of the form:

Pi (x_1 : A_1) ... (x_n : A_n) [y: C x_1 ... x_n], D t_1 ... t_m

We can add it as an edge from C to D. At least, this covers instances such as ordered_ring.to_ordered_semiring.

However, this raises another issue. What do we do if we already have an edge from C to D? If we replace it, then we should also update the transitive closure. If we update the transitive closure, we need to generate new auxiliary definitions.

BTW, for coercions, we do not create shortcut definitions for derived coercions. Derived coercions are stored as lambda expressions. We also allow multiple edges from C to D in the coercion graph, and the t_i's in D are not necessarily variables. For example, we allow coercions such as

list_to_vec : Pi (A : Type) (l : list A) : vector A (length l)

leodemoura commented 9 years ago

Then, we get f_1 (f_2 (g e)) and f_2 (g e). We are affecting different type class resolution problems with a single new instance. If we create a shortcut from E to A and E to B, the new instance g will not affect the derivation.

We can workaround this issue by "playing" with priorities.

fpvandoorn commented 9 years ago

I see that these shortcut instances will lead to (syntactically) different instances, but I don't see any issues with that. If, in your example, g e is a reducible definition defined as f_3 (f_4 e), then I think we will hardly notice the difference between f_1 (f_2 (f_3 (f_4 e))) and f_1 (f_2 (g e)). Even the pattern matching of the rewrite tactic unfolds reducible definitions, right?

On second thought, I might see the issue. You probably weren't planning to define g e as f_3 (f_4 e), but directly using the constructor: C.mk _ _ ... _. Because if we would define the shortcuts just as a composition, then we don't gain any performance bonus (if we make the definition reducible). In that case, if I have an expression a+b in a field, where + means @has_add.add A (field.to_has_add A s), and I want to rewrite it (not necessarily with the rewrite tactic) using add.comm, which expects a + which means @has_add.add A (add_semigroup.to_has_add A ?t). Will that unify? I think it will, but I'm not sure.

However, this raises another issue. What do we do if we already have an edge from C to D?

I'm not sure how this feature will work, for example how a user will specify that ordered_ring.to_ordered_semiring is an instance which should be considered as constructor for the transitive closure. In this particular case, if we already have an instance which takes an ordered_ring and makes it into an ordered_semiring, it's stupid to also consider the user-defined one for the transitive closure, so we can disallow it.

leodemoura commented 9 years ago

I see that these shortcut instances will lead to (syntactically) different instances, but I don't see any issues with that. If, in your example, g e is a reducible definition defined as f_3 (f_4 e), then I think we will hardly notice the difference between f_1 (f_2 (f_3 (f_4 e))) and f_1 (f_2 (g e)). Even the pattern matching of the rewrite tactic unfolds reducible definitions, right?

In principle, g can do something completely different. This is not a totally unreasonable scenario, for example, a structure may have a multiplicative and an additive group. We may play with the local instances to select which one is generated. This is not a problem for the algebraic hierarchy in the standard library since we have group and add_group, but in principle this is an issue. My concern is that we need a general solution. Other users may have their our hierarchies.

In that case, if I have an expression a+b in a field, where + means @has_add.add A (field.to_has_add A s), and I want to rewrite it (not necessarily with the rewrite tactic) using add.comm, which expects a + which means @has_add.add A (add_semigroup.to_has_add A ?t). Will that unify?

It works if we the arguments of C.mk are projections of the input structure. Example, to convert field into a group, we would generate

definition field_to_group {A : Type} [s : field A] : group A :=
group.mk (@field.mul A s) (@field.mul_assoc A s) (@field.one A s) ...

However, if we use field.rec to make it more efficient, it will not unify. For @has_add.add A (add_semigroup.to_has_add A ?t), we would synthesize ?t using type class resolution. Since we have a field in the context, Lean would convert it into an add_semigroup. Both @has_add.add A (@field.to_has_add A s) and @has_add.add A (add_semigroup.to_has_add A (field.to_semigroup A s)) evaluate to @field.add A s.

I'm not sure how this feature will work, for example how a user will specify that ordered_ring.to_ordered_semiring is an instance which should be considered as constructor for the transitive closure.

Good question. What about a new [transitive-instance] annotation?

In this particular case, if we already have an instance which takes an ordered_ring and makes it into an ordered_semiring, it's stupid to also consider the user-defined one for the transitive closure, so we can disallow it.

I agree that in this case it doesn't make sense, but the type class mechanism is quite general. I'm concerned that we may find scenarios where there isn't an obvious canonical instance. For example there are different alternatives, and the user is trying to override the current one. Of course, we may claim this is a bad application of type classes.

Here is another weird scenario

structure S1 [class] (A B : Type) :=
(a : A) (b : B)

structure S2 [class] (A : Type) extends S1 A A :=
(p : a ≠ b)

structure S3 [class] (A : Type) extends S2 A, S1 A nat renaming a→a' b→b'

There are two different ways to go from S3 to S1. The first one converts S3 A into S1 A A, and the second S1 A nat. We have to take that into account in the child/parent graph.

fpvandoorn commented 9 years ago

You're right. Users can do a lot of things with type classes, and some of those uses we might now consider weird, but it may have its uses.

I think to cover all cases, we should create the transitive closure of a multigraph. So if we define a new stucture A, we make a multigraph out of all parent-child-instances + all instances with the [transitive-instance] attribute and then generate all instances which are paths in the multigraph starting at A to any other structure. If two paths from A to another class B are definitionally equal, then we should only keep one of them, but if there are two different paths, we should keep both.

In our current algebraic hierarchy this should mean that there is a unique instance from any class with more structure to any class with less structure.

leodemoura commented 9 years ago

I think to cover all cases, we should create the transitive closure of a multigraph. So if we define a new stucture A, we make a multigraph out of all parent-child-instances + all instances with the [transitive-instance] attribute and then generate all instances which are paths in the multigraph starting at A to any other structure. If two paths from A to another class B are definitionally equal, then we should only keep one of them, but if there are two different paths, we should keep both.

I agree.

BTW, we are already using multigraphs for coercions.

leodemoura commented 9 years ago

I'm experimenting with this one, but I don't have any good news. I'm also having second thoughts about this feature. First, the basic idea doesn't work: just add the "transitive instances". Here is the problem. Suppose we have algebra and nat open. The notation +, <, * ... is overloaded. Moreover, in the standard library, by default, nat is not an instance of any algebraic structure. The idea is that given a b : nat, we want a + b to be nat.add a b. However, this decision backfires with the transitive closure idea. Lean will first try first has_add.add nat ?s a b, and try to synthesize ?s : has_add nat. It will fail, but it will take a very long time to fail. After we compute the transitive closure, there is a quadratic number of new instances, and an exponential number of paths to try :-( I think the following workarounds are bad ideas:

Make sure that nat is an instance of the algebraic structures. This is bad since it have created problems in other places in the past.
Make sure we open nat before algebra. Then we will try nat.add before we try the algebraic hierarchy. This is bad because users will forget to do it.

Moreover, these workarounds are just workarounds. We would still have an exponential number of paths to try during type class resolution. When the type class resolution has an solution, there is no problem since we usually find id very quickly. However, whenever we make a mistake, type class resolution would consume a lot of resources because Lean would try all paths and there are an exponential number of them.

In the commits above, I tried a different trick. The transitive instances are kept in a separate set, and they are only used if they solve the synthesis problem in one step. This is a big hack, and will probably confuse users, but it solved the performance problem. However, the performance gain is not uniform. It depends on many factors. For now, this hack is in a different branch.

fpvandoorn commented 9 years ago

That is some bad news. I agree that both workarounds are not good.

What do you mean exactly by "and they are only used if they solve the synthesis problem in one step"? After that step, all remaining type class arguments should be present in the local context? Must they also be the first step of the synthesis problem?

I don't think it will confuse too much users. Usually the term found by type class resolution is hidden, and usually it doesn't matter which instance is found, because they are all definitionally equal.

Would an alternative solution be to cache the type class resolution problems which fail (in the given context)? That should be another way to take less than exponentially many steps, without requiring a [trans-instance] attribute, right? But I think that still requires a quadratic number of steps, while your solution with [trans-instance] takes only linearly many.

leodemoura commented 9 years ago

What do you mean exactly by "and they are only used if they solve the synthesis problem in one step"? After that step, all remaining type class arguments should be present in the local context? Must they also be the first step of the synthesis problem?

I was not clear. Let me try to explain the hack.

1- Separate the class instances in two sets: the explicit ones; and the ones that are created when we compute the transitive closure of the multi graph https://github.com/leodemoura/lean/blob/8137b5e78492cab6fe6f3ff89cedb1355a79cdde/src/library/class.cpp#L40-L41

2- When Lean is synthesizing a term by type class resolution, it creates many subproblems. We can associate a "depth" with each subproblem. Suppose we are trying to synthesize has_lt nat. This is depth 0. Now, suppose Lean tries semigroup.to_has_lt (A : Type) [s : semigroup A] : has_lt A. It will create the subproblem semigroup nat at depth 1, and so on.

3- The first hack is to consider the derived instances only at depth 0. https://github.com/leodemoura/lean/blob/8137b5e78492cab6fe6f3ff89cedb1355a79cdde/src/library/class_instance_synth.cpp#L331

4- The second hack is to consider only local instances for solving subproblems created by a derived instance. The idea is: given A : Type, if we are trying to synthesize has_lt A, then field.to.has_lt will succeed only when there is a local instance [s : field A]. https://github.com/leodemoura/lean/blob/8137b5e78492cab6fe6f3ff89cedb1355a79cdde/src/library/class_instance_synth.cpp#L300-L303

I'm investigating other solutions.

leodemoura commented 9 years ago

Would an alternative solution be to cache the type class resolution problems which fail (in the given context)?

It is quite tricky to do any caching in the type class resolution procedure. It is too dynamic. For example, it depends on the context. Moreover, as the search progresses the context changes.

leodemoura commented 9 years ago

That should be another way to take less than exponentially many steps, without requiring a [trans-instance] attribute, right?

Yes, this is correct. The main problem is how to do the caching. In the past, I have tried to cache the results produced by type class resolution to avoid redundant work, but it was quite tricky.

Sometimes, I think we need a less general procedure for the algebraic hierarchy. Today, I'll too frustrated to try anything, but tomorrow I will try to add projections to the kernel. It is an orthogonal feature, but it will improve the performance in the algebraic hierarchy.

avigad commented 9 years ago

For what it is worth, the hack seems perfectly reasonable to me. It seems to cover the common use case where we really want shortcuts, and I don't see how it might confuse users.