`with` in class instances can cause surprising slowdowns

kbuzzard commented 1 year ago

Prerequisites

[x] Put an X between the brackets on this line if you have done all of the following:
- Checked that your issue isn't already filed.
- Reduced the issue to a self-contained, reproducible test case.

Description

The following different ways to make instances of a class result in different terms:

class A where
  foo : String
  bar : String

class B extends A where
  extra : String

instance f : A := ⟨"hi", "there"⟩

instance one : B :=
  { f with
    foo := f.foo
    extra := "bro" }

instance two : B :=
  { toA := f
    extra := "bro" }

instance three : B :=
  { f with
    extra := "bro" }

instance four : B where
  extra := "bro"

set_option pp.all true in
#print one
-- def one : B :=
-- let src : A := f;
-- @B.mk (A.mk (@A.foo f) (@A.bar src)) "bro"

set_option pp.all true in
#print two
-- def two : B :=
-- @B.mk f "bro"

set_option pp.all true in
#print three
-- def three : B :=
-- let src : A := f;
-- @B.mk (A.mk (@A.foo src) (@A.bar src)) "bro"

set_option pp.all true in
#print four
-- def four : B :=
-- @B.mk f "bro"

In Mathlib's algebra hierarchy, these let src terms (the terms produced by with) seem to greatly increase the amount of time taken by typeclass inference to prove that various terms are defeq. Here are two case studies from mathlib.

1) In mathlib PR https://github.com/leanprover-community/mathlib4/pull/6398 I remove one with from a definition and according to the speedcenter, another file now compiles 42% faster. In the Zulip post by Xavier Roblot which prompted the PR, the change makes the first example compile an order of magnitude faster. This is just one change to one with.

2) In Mathlib PR https://github.com/leanprover-community/mathlib4/pull/6319 , Matt Ballard removes all the withs in just one file in mathlib, and the benchmark results are extraordinary: http://speedcenter.informatik.kit.edu/mathlib4/compare/c7852218-4442-4a5f-8923-2062c59fbe45/to/ad01e2d7-e057-449b-b3f2-2769bf28d4e0 . A sea of green.

If something could be changed in core which would save us the bother of removing many many occurrences of with in mathlib, that would be helpful.

Versions

Lean (version 4.0.0-nightly-2023-08-03, commit 125a0ba798e7, Release)

Additional Information

Here is a comparison of the instance traces with trace.profiler true and pp.proofs.withType false in Xavier Roblot's Zulip example (case study 1 above). The diff in mathlib is this:

-  { (inferInstance : Group αˣ) with
-    mul_comm := fun _ _ => ext <| mul_comm _ _ }
+  { mul_comm := fun _ _ => ext <| mul_comm _ _ }

Before the change things looked like this:

                    [Meta.isDefEq.assign] [0.392895s] ✅ ?m.2672 := torsion K
                      [Meta.isDefEq.assign.checkTypes] [0.392861s] ✅ (?m.2672 : Subgroup
                            { x // x ∈ 𝓞 K }ˣ) := (torsion K : Subgroup { x // x ∈ 𝓞 K }ˣ)
                        [Meta.isDefEq] [0.392851s] ✅ Subgroup { x // x ∈ 𝓞 K }ˣ =?= Subgroup { x // x ∈ 𝓞 K }ˣ
                          [Meta.synthInstance] [0.041633s] ✅ CommGroup { x // x ∈ 𝓞 K }ˣ
                            [Meta.synthInstance] [0.032970s] ✅ apply @Units.instCommGroupUnitsToMonoid to CommGroup
                                  { x // x ∈ 𝓞 K }ˣ
                              [Meta.synthInstance.tryResolve] [0.032916s] ✅ CommGroup
                                    { x // x ∈ 𝓞 K }ˣ ≟ CommGroup { x // x ∈ 𝓞 K }ˣ
                                [Meta.isDefEq] [0.025502s] ✅ CommGroup { x // x ∈ 𝓞 K }ˣ =?= CommGroup ?m.2681ˣ
                                  [Meta.isDefEq] [0.025467s] ✅ { x // x ∈ 𝓞 K }ˣ =?= ?m.2681ˣ
                                    [Meta.synthInstance] [0.016779s] ✅ CommMonoid { x // x ∈ 𝓞 K }
                          [Meta.isDefEq] [0.351141s] ✅ CommGroup.toGroup =?= Units.instGroupUnits
                            [Meta.isDefEq] [0.351049s] ✅ CommGroup.toGroup =?= let src := inferInstance;
                                Group.mk _
                              [Meta.isDefEq] [0.351043s] ✅ CommGroup.toGroup =?= Group.mk _
                                [Meta.isDefEq] [0.350975s] ✅ Units.instCommGroupUnitsToMonoid.1 =?= Group.mk _
                                  [Meta.isDefEq] [0.350947s] ✅ Group.mk _ =?= Group.mk _
                                    [Meta.isDefEq] [0.343292s] ✅ Group.toDivInvMonoid =?= DivInvMonoid.mk zpowRec
                                      [Meta.isDefEq] [0.343225s] ✅ inferInstance.1 =?= DivInvMonoid.mk zpowRec
                                        [Meta.isDefEq] [0.343197s] ✅ DivInvMonoid.mk zpowRec =?= DivInvMonoid.mk zpowRec
                                          [Meta.isDefEq] [0.075866s] ✅ zpowRec =?= zpowRec
                                            [Meta.isDefEq] [0.075857s] ✅ zpowRec =?= zpowRec
                                            ...
                                          [Meta.isDefEq] [0.106764s] ✅ Monoid.mk _ _ npowRec =?= Monoid.mk _ _ npowRec
                                            [Meta.isDefEq] [0.053281s] ✅ npowRec =?= npowRec
                                            ...
                                            [Meta.isDefEq] [0.037335s] ✅ Semigroup.mk _ =?= Semigroup.mk _
                                            ...
                                            [Meta.isDefEq] [0.016048s] ✅ { one := 1 } =?= { one := 1 }
                                          [Meta.isDefEq] [0.022721s] ✅ { inv := Inv.inv } =?= { inv := Inv.inv }
                                            [Meta.isDefEq] [0.015109s] ✅ Inv.inv =?= Inv.inv
                                            ...
                                          [Meta.isDefEq] [0.137621s] ✅ {
                                                div := DivInvMonoid.div' } =?= { div := DivInvMonoid.div' }
                                            [Meta.isDefEq] [0.137499s] ✅ DivInvMonoid.div' =?= DivInvMonoid.div'
                                            ...

with ... meaning that I removed a bunch of other lines all of which were of the form X =?= X where X and X look syntactically equal.

Lean wants to assign ?m.2672 := torsion K (torsion K is a group) and then has to check that two group structures coincide. Units.instCommGroupUnitsToMonoid is the instance which has been dewithed. Before the change, Lean goes into interminable checks that various "obviously defeq" fields from Group are defeq, each costing up to 0.1 seconds. After the change the same trace (in full) looks like this:

                    [Meta.isDefEq.assign] [0.051093s] ✅ ?m.1366 := torsion K
                      [Meta.isDefEq.assign.checkTypes] [0.051055s] ✅ (?m.1366 : Subgroup
                            { x // x ∈ 𝓞 K }ˣ) := (torsion K : Subgroup { x // x ∈ 𝓞 K }ˣ)
                        [Meta.isDefEq] [0.051042s] ✅ Subgroup { x // x ∈ 𝓞 K }ˣ =?= Subgroup { x // x ∈ 𝓞 K }ˣ
                          [Meta.synthInstance] [0.043180s] ✅ CommGroup { x // x ∈ 𝓞 K }ˣ
                            [Meta.synthInstance] [0.034260s] ✅ apply @Units.instCommGroupUnitsToMonoid to CommGroup
                                  { x // x ∈ 𝓞 K }ˣ
                              [Meta.synthInstance.tryResolve] [0.034205s] ✅ CommGroup
                                    { x // x ∈ 𝓞 K }ˣ ≟ CommGroup { x // x ∈ 𝓞 K }ˣ
                                [Meta.isDefEq] [0.026653s] ✅ CommGroup { x // x ∈ 𝓞 K }ˣ =?= CommGroup ?m.1375ˣ
                                  [Meta.isDefEq] [0.026616s] ✅ { x // x ∈ 𝓞 K }ˣ =?= ?m.1375ˣ
                                    [Meta.synthInstance] [0.017559s] ✅ CommMonoid { x // x ∈ 𝓞 K }
(end)

and in particular the process has gone down from 0.4 seconds to 0.05 seconds, and the second part of the story in the original trace, beginning

                          [Meta.isDefEq] [0.351141s] ✅ CommGroup.toGroup =?= Units.instGroupUnits
                            [Meta.isDefEq] [0.351049s] ✅ CommGroup.toGroup =?= let src := inferInstance;
                                Group.mk _
...

has completely vanished.

Before the change, typeclass inference is taking 0.35 seconds to prove Units.instCommGroupUnitsToMonoid.1 =?= Group.mk _ , and this calculation is occurring around 20 times during elaboration of the term, which explains why it was taking 7 seconds to elaborate.

kim-em commented 1 year ago

Note there is useful discussion of this issue on zulip also, in particular Matthew and Eric's suggestions.

kbuzzard commented 1 year ago

It has been suggested that I try to minimise the problem. My immediate reaction to this is "this would be a perfect job for a machine" (I believe Coq has an MWE-maker). But in practice I am going on holiday in a few days and will not have any time to think about this for the next two weeks. I made a preliminary start on the manual approach in this repo, which collects together a bunch of minimised hierarchies which I found lying around my computer from previous minimisation projects. Given the nature of this issue, I suspect that it would be possible to minimise. Sometimes things in mathlib are slow because there are hundreds of instances which are confusing typeclass inference or whatever -- but here I suspect that a few withs will be enough to cause slowdowns even in core.

mattrobball commented 1 year ago

I think the problem is the following. Take

structure A where
  x : Bool

structure B extends A where

structure C extends A where

def a : A := ⟨true⟩

def c : C := {toA := ⟨true⟩}

def b : B := {a with}

def b' : B := {c with}

#print b
-- extra eta
-- let src := a;
-- { toA := { x := src.x } }

#print b'
-- no extra eta
-- let src := c;
-- { toA := src.toA }

Note that b has its toA field eta expanded while b' does not. The function

def mkProjStx? (s : Syntax) (structName : Name) (fieldName : Name) : TermElabM (Option Syntax) := do
  if (findField? (← getEnv) structName fieldName).isNone then
    return none
  return some <| mkNode ``Parser.Term.proj #[s, mkAtomFrom s ".", mkIdentFrom s fieldName]

is used to construct the syntax for the fields from the user provided structure instances in addMissingFields. However, it will not apply when the field in question is a direct projection to a parent. This results in an extra round of eta expansion to get to a projection.

A consequence of this is extra work during unification since it will unfold the eta expanded terms that result. On Zulip I gave a toy example of this behavior based on actual examples given by @eric-wieser and @AntoineChambert-Loir where these speed penalties pile up to start causing some of the timeouts we see throughout mathlib.

In my fork, I modified addMissingFields to allow for filling in the direct parent projections and benchmarked the results in leanprover-community/mathlib4#6539. There one has

structure A where
  x : Bool

structure B extends A where

def a : A := ⟨true⟩

def b : B := {a with}

#print b
-- let src := a;
-- { toA := src }

Let me know if PR is appropriate.

eric-wieser commented 7 months ago

@mattrobball, what's your opinion on this issue in light of #2451? My impression was that the problem still exists (as getting the wrong order in with can be harmful), but not as badly as before.

mattrobball commented 7 months ago

Oh I thought this was closed already. The issue was addressed. The ordering is a new problem.

leanprover / lean4