Elaboration of explicit term proof significantly slower than equivalent tactic proof

somombo commented 2 weeks ago

Prerequisites

[x] Put an X between the brackets on this line if you have done all of the following:
- Check that your issue is not already filed.
- Reduce the issue to a minimal, self-contained, reproducible test case. Avoid dependencies to mathlib4 or std4.

Description

I have an example of a term based proof (in a recursive context) being about 200 times slower to elaborate than by simply adding "by exact ..." (or "by apply ...") to the front of the same term (see mwe below).

Context

See Swap-Perm project which formally defines permutations of arrays and proves related theorems.

I am trying to prove a basic theorem about an extension to the definition of Array.swap that allows for making several swaps all at once. My extension is called Array.swaps and accepts a list of pairs of indices instead of just a single pair of indices.

My first attempt at the theorem in question (Array.swaps_cancel below) was a entirely a term based recursive proof. The offending line of code was inductive hypothesis (See https://github.com/somombo/swaps-perm/blob/b4a05c1775ad1d32eadc07b635c10a25323a1815/SwapsPerm.lean#L100-L101)

Steps to Reproduce

Consider the following mwe:

def Array.swaps (a : Array α) : List (Fin a.size × Fin a.size) → Array α
  | [] => a
  | (i, j) :: ijs =>
    have : (a.swap i j).size = a.size := a.size_swap _ _

    swaps (a.swap i j) (ijs.map (fun p => ⟨⟨p.1.1, this.symm ▸ p.1.2⟩, ⟨p.2.1, this.symm ▸ p.2.2⟩⟩))
termination_by l => l.length

set_option trace.profiler true in
theorem Array.swaps_cancel (a : Array α) (l : List (Fin a.size × Fin a.size)) : a.swaps (l ++ l.reverse) = a :=
  match l with
  | [] => sorry
  | c :: cs =>

    have h : a.size = (a.swaps [c]).size := sorry

    have ih1 : ((a.swaps [c]).swaps ((h ▸ cs) ++ (h ▸ cs).reverse)) = (a.swaps [c]) := swaps_cancel (a.swaps [c]) (h ▸ cs) -- FIXME: takes approx 5s
    -- have ih2 : swaps (swaps a [c]) (h ▸ cs ++ List.reverse (h ▸ cs)) = swaps a [c] := swaps_cancel (a.swaps [c]) (h ▸ cs) -- FIXME: takes approx 1.7s
    -- have ih3 : ((a.swaps [c]).swaps ((h ▸ cs) ++ (h ▸ cs).reverse)) = (a.swaps [c])  := by exact swaps_cancel (a.swaps [c]) (h ▸ cs) -- LGTM: takes approx 0.025s
    -- have ih4  := swaps_cancel (a.swaps [c]) (h ▸ cs) -- LGTM: takes approx 0.018s

    sorry
termination_by l.length
decreasing_by sorry

Take note of inductive hypotheses ih1, ih2, and ih4 which are all proven using an exact term and ih3 which is proven by tactic.

Uncomment whichever inductive hypothesis you would like to test the performance of, and comment out the rest. Upon the completion of elaboration, trace.profiler will show how many seconds it took to elaborate the version of the theorem with whichever inductive hypothesis is under consideration.

Expected behavior: I'd expect inductive hypothesis ih3, to either elaborate (almost) as quickly as ih2 and ih1 or be slower than them. I'd additionally expect all proofs to take no more than say 0.5-1sec to elaborate (seeing as they are relatively straight forward). Lastly, I am surprised to see that leaving out the type (theorem statement) entirely while using an exact term proof as in ih4 resulted in the fastest elaboration of all. The type it infers in that case, is precisely the one I explicitly provided in ih2.

Actual behavior: On my machine (v4.7.0) the various versions take the following amounts of time to elaborate

ih1 takes approx 5s
ih2 takes approx 1.7s
ih3 takes approx 0.025s
ih4 takes approx 0.018s

Versions

mainly 4.7.0 .. (on 4.8.0-rc1, all results above were consistently about 1 or 2seconds slower) Windows 10 Pro Version 22H2 Build 19045.4291 Also, WSL Ubuntu 20.04.6 LTS (results were similar to above)

Additional Information

n/a

Impact

Add :+1: to issues you consider important. If others are impacted by this issue, please ask them to add :+1: to it.

nomeata commented 2 weeks ago

Thanks for the excellent bug report, including even checking that it’s still the case on 4.8.0-rc1.

nomeata commented 2 weeks ago

Initial investigation:

Unsurprisingly

attribute [irreducible] Array.swaps

will make all of them very fast, so #3995 or maybe #4002 might fix this.

Indeed

attribute [irreducible] WellFounded.fix

makes it much faster, but still slower than with Array.swaps irreducible. This is an indication that #3995 may be preferable to #4002.

The new diagnostics confirm this; the first line produces

[reduction] unfolded declarations (max: 44101, num: 30): ▼
  List.rec ↦ 44101
  Acc.rec ↦ 29400
  Nat.rec ↦ 29400
  PSigma.rec ↦ 19742
  List.length ↦ 19601
  set ↦ 18900
  WellFounded.apply ↦ 14952
  WellFounded.rec ↦ 14952
  Eq.rec ↦ 14700
  List.map ↦ 14700
  sizeOf ↦ 9942
  InvImage.accessible ↦ 9940
  InvImage.accAux ↦ 9940
  rfl ↦ 9802
  InvImage.wf ↦ 4958
  invImage.proof_1 ↦ 4958
  swaps._unary.proof_1 ↦ 4956
  swaps ↦ 4914
  WellFounded.fix ↦ 4914
  WellFounded.fixF ↦ 4914
  swaps._unary ↦ 4914
  Add.add ↦ 4900
  swap ↦ 4900
  HAdd.hAdd ↦ 4900
  Nat.add ↦ 4900
  Nat.eq_or_lt_of_le ↦ 4900
  Or.rec ↦ 4900
  Prod.rec ↦ 4900
  Append.append ↦ 350
  HAppend.hAppend ↦ 350

So it’s clear that something is starting to unfold Array.swaps here, and that this isn’t good. But I wonder if we can find out which call to defeq is doing that, and whether we can avoid them (as they might be expensive independent of wfrec).

nomeata commented 2 weeks ago

The problem is certainly within the second occurrence of the subst notation, because

    have ih1 : ((a.swaps [c]).swaps ((h ▸ cs) ++ (h.rec (motive := fun n _ => List (Fin n × Fin n)) cs).reverse)) = (a.swaps [c]) := swaps_cancel (a.swaps [c]) (h ▸ cs) -- LGTM: takes approx 0.018s

is also fast and we see in the example above that using by exact (which hides the expected type, or at least changes the elaboration order) has an impact.

I stared a bit at traces and added a few trace lines to elabSubst, but without anything conclusive.

nomeata commented 1 week ago

With

set_option trace.Meta.isDefEq.onFailure true in

you get this (large) output: https://gist.github.com/nomeata/b637a6acd24449be9d6e92f0c7455bfd and there is certainly a lot of repetition. Maybe somehow caching doesn’t work as it should?

leanprover / lean4