modifying TC synth order can give 2% speedup of mathlib

The mathlib PR https://github.com/leanprover-community/mathlib4/pull/16646 gives a 2% speedup of mathlib, coming from a 6% speedup of TC synth. This is achieved by changing the TC synth order of certain instances, using mechanisms that were likely not intended as user-facing API, and deviating from the documented behaviour of TC synth. (The PR was merged into mathlib, but later reverted, in order to allow for more research on this topic.)

The following MWE displays behaviour that we believe is analogous to what happens with mathlib's algebraic hierarchy. (Code snippet adapted from https://leanprover.zulipchat.com/#narrow/stream/287929-mathlib4/topic/Ways.20to.20speed.20up.20Mathlib/near/469291144)

import Lean.Elab.Command
import Lean.Elab.MatchExpr
import Lean.Meta.Instances

-- # Example hierarchy

class A (n : Nat) where

instance [A n] : A n.succ where

class B [A 20050] where

class C [A 20000] extends B where

instance : A 20050 where

class D where

instance inst1 : D where
instance inst2 [B] : D where

-- # TC synth times out by default

/--
error: (deterministic) timeout at `typeclass`, maximum number of heartbeats (20000) has been reached
Use `set_option synthInstance.maxHeartbeats <num>` to set the limit.
Additional diagnostic information may be available using the `set_option diagnostics true` command.
-/
#guard_msgs in
#synth D

The beginning of the trace looks like

[Meta.synthInstance] 💥️ D ▼
  [] new goal D ▼
  [instances] #[inst1, @inst2] 
  [] ✅️ apply @inst2 to D ▼
  [tryResolve] ✅️ D ≟ D 
  [] new goal B ▼
  [instances] #[@C.toB] 
  [] ✅️ apply @C.toB to B ▼
  [tryResolve] ✅️ B ≟ B 
  [] new goal A 20000 ▼
  [instances] #[@instASucc] 
  [] ✅️ apply @instASucc to A 20000 ▼
  [tryResolve] ✅️ A 20000 ≟ A (Nat.succ 19999) 
  [] new goal A 19999 ▼
  [instances] #[@instASucc] 
  [] ✅️ apply @instASucc to A 19999 ▼
  [tryResolve] ✅️ A 19999 ≟ A (Nat.succ 19998) 
  [] new goal A 19998 ▶
  [] ✅️ apply @instASucc to A 19998 ▶
  [] ✅️ apply @instASucc to A 19997 ▶
<... snip>

When Lean applies C.toB it first attempts to find an instance for A 20000, and times out while doing so. So it never gets to look for an instance of C, which does not exist.

-- # Meta code to specify synth order

namespace Lean.Meta

/-- Specify the synthesization order for instances. -/
def Instances.setSynthOrder (declName : Name) (synthOrder : Array Nat) : CoreM Unit := do
  let some entry := Meta.instanceExtension.getState (← getEnv) |>.instanceNames.find? declName |
    throwError "'{declName}' does not have [instance] attribute"
  instanceExtension.add { entry with synthOrder } entry.attrKind

end Lean.Meta

namespace Lean.Elab.Command

open Meta

/-- Specify the synthesization order for instances. -/
elab "set_synth_order " name:ident synthOrder:term : command => do
  let q := Term.MatchExpr.toDoubleQuotedName name
  elabCommand (← `(command| run_meta Instances.setSynthOrder $q $synthOrder))

set_synth_order instAddNat #[]

end Lean.Elab.Command

-- # TC synth succeeds with a new order

set_synth_order C.toB #[1, 0]

/-- info: instD -/
#guard_msgs in
#synth D

Now the trace looks like

[Meta.synthInstance] ✅️ D ▼
  [] new goal D ▼
  [instances] #[inst1, @inst2] 
  [] ✅️ apply @inst2 to D ▼
  [tryResolve] ✅️ D ≟ D 
  [] new goal B ▼
  [instances] #[@C.toB] 
  [] ✅️ apply @C.toB to B ▼
  [tryResolve] ✅️ B ≟ B 
  [] no instances for C ▼
  [instances] #[] 
  [] ✅️ apply inst1 to D ▼
  [tryResolve] ✅️ D ≟ D 
  [answer] ✅️ D 
  [] result inst1

where Lean immediately bails on C.toB and continues with inst1.

leanprover / lean4

modifying TC synth order can give 2% speedup of mathlib #5333