Closed somombo closed 1 week ago
Thanks for the excellent bug report, including even checking that it’s still the case on 4.8.0-rc1.
Initial investigation:
Unsurprisingly
attribute [irreducible] Array.swaps
will make all of them very fast, so #3995 or maybe #4002 might fix this.
Indeed
attribute [irreducible] WellFounded.fix
makes it much faster, but still slower than with Array.swaps
irreducible. This is an indication that #3995 may be preferable to #4002.
The new diagnostics confirm this; the first line produces
[reduction] unfolded declarations (max: 44101, num: 30): ▼
List.rec ↦ 44101
Acc.rec ↦ 29400
Nat.rec ↦ 29400
PSigma.rec ↦ 19742
List.length ↦ 19601
set ↦ 18900
WellFounded.apply ↦ 14952
WellFounded.rec ↦ 14952
Eq.rec ↦ 14700
List.map ↦ 14700
sizeOf ↦ 9942
InvImage.accessible ↦ 9940
InvImage.accAux ↦ 9940
rfl ↦ 9802
InvImage.wf ↦ 4958
invImage.proof_1 ↦ 4958
swaps._unary.proof_1 ↦ 4956
swaps ↦ 4914
WellFounded.fix ↦ 4914
WellFounded.fixF ↦ 4914
swaps._unary ↦ 4914
Add.add ↦ 4900
swap ↦ 4900
HAdd.hAdd ↦ 4900
Nat.add ↦ 4900
Nat.eq_or_lt_of_le ↦ 4900
Or.rec ↦ 4900
Prod.rec ↦ 4900
Append.append ↦ 350
HAppend.hAppend ↦ 350
So it’s clear that something is starting to unfold Array.swaps
here, and that this isn’t good. But I wonder if we can find out which call to defeq is doing that, and whether we can avoid them (as they might be expensive independent of wfrec).
The problem is certainly within the second occurrence of the subst notation, because
have ih1 : ((a.swaps [c]).swaps ((h ▸ cs) ++ (h.rec (motive := fun n _ => List (Fin n × Fin n)) cs).reverse)) = (a.swaps [c]) := swaps_cancel (a.swaps [c]) (h ▸ cs) -- LGTM: takes approx 0.018s
is also fast and we see in the example above that using by exact
(which hides the expected type, or at least changes the elaboration order) has an impact.
I stared a bit at traces and added a few trace
lines to elabSubst
, but without anything conclusive.
With
set_option trace.Meta.isDefEq.onFailure true in
you get this (large) output: https://gist.github.com/nomeata/b637a6acd24449be9d6e92f0c7455bfd and there is certainly a lot of repetition. Maybe somehow caching doesn’t work as it should?
Prerequisites
Description
I have an example of a term based proof (in a recursive context) being about 200 times slower to elaborate than by simply adding "
by exact
..." (or "by apply ...") to the front of the same term (see mwe below).Context
See Swap-Perm project which formally defines permutations of arrays and proves related theorems.
I am trying to prove a basic theorem about an extension to the definition of
Array.swap
that allows for making several swaps all at once. My extension is calledArray.swaps
and accepts a list of pairs of indices instead of just a single pair of indices.My first attempt at the theorem in question (
Array.swaps_cancel
below) was a entirely a term based recursive proof. The offending line of code was inductive hypothesis (See https://github.com/somombo/swaps-perm/blob/b4a05c1775ad1d32eadc07b635c10a25323a1815/SwapsPerm.lean#L100-L101)Steps to Reproduce
Consider the following mwe:
Take note of inductive hypotheses
ih1
,ih2
, andih4
which are all proven using an exact term andih3
which is proven by tactic.Uncomment whichever inductive hypothesis you would like to test the performance of, and comment out the rest. Upon the completion of elaboration,
trace.profiler
will show how many seconds it took to elaborate the version of the theorem with whichever inductive hypothesis is under consideration.Expected behavior: I'd expect inductive hypothesis
ih3
, to either elaborate (almost) as quickly asih2
andih1
or be slower than them. I'd additionally expect all proofs to take no more than say 0.5-1sec to elaborate (seeing as they are relatively straight forward). Lastly, I am surprised to see that leaving out the type (theorem statement) entirely while using an exact term proof as inih4
resulted in the fastest elaboration of all. The type it infers in that case, is precisely the one I explicitly provided inih2
.Actual behavior: On my machine (v4.7.0) the various versions take the following amounts of time to elaborate
ih1
takes approx 5sih2
takes approx 1.7sih3
takes approx 0.025sih4
takes approx 0.018sVersions
mainly 4.7.0 .. (on 4.8.0-rc1, all results above were consistently about 1 or 2seconds slower) Windows 10 Pro Version 22H2 Build 19045.4291 Also, WSL Ubuntu 20.04.6 LTS (results were similar to above)
Additional Information
n/a
Impact
Add :+1: to issues you consider important. If others are impacted by this issue, please ask them to add :+1: to it.