RFC: simp to detect, report and ignore inherently looping rewrite rules

Motivation

The simplifier is not very helpful when giving a non-terminating simp set:

variable (P : Nat → Prop)
/--
error: tactic 'simp' failed, nested error:
maximum recursion depth has been reached
use `set_option maxRecDepth <num>` to increase limit
use `set_option diagnostics true` to get diagnostic information
-/
#guard_msgs in
example (a b c : Nat) : P (a + b + c) := by simp [Nat.add_assoc, (Nat.add_assoc _ _ _).symm]

inductive Tree (α : Type) where | node : α → List (Tree α) → Tree α
def Tree.children : Tree α → List (Tree α) | .node _ ts => ts
def Tree.size (t : Tree α) := 1 + Nat.sum (t.children.attach.map (fun ⟨c,_⟩  => c.size))
decreasing_by simp_wf; cases t; simp_all [Tree.children]; decreasing_trivial

/--
error: tactic 'simp' failed, nested error:
maximum recursion depth has been reached
use `set_option maxRecDepth <num>` to increase limit
use `set_option diagnostics true` to get diagnostic information
-/
#guard_msgs in
example (t : Tree α) : 0 < t.size := by simp [Tree.size]

The simplifier only says that it ran too far, but gives very little help for debugging this. The newish set_option diagnostics true can help, but still needs careful investigation – if lemma foo is the bad looping one, but in each step some innocent other lemmas are used a few times, foo will appear rather low on the list.

I assume that users face this not uncommonly in the two situations outlined above:

adding a lemma to the simp call that goes against the current simp-set (e.g. Nat.assoc)
using simp [f] with a recursive funciton f whose equational lemma is inherently looping.

Goal

Instead of the unspecific error message above I’d like simp to

Recognize such “inherently looping” rewrite rules before applying them.
If recognized as such, ignore the rule, but still make progress elsewhere.
Print a helpful warning that some rules were ignored.
In particular: indicate exactly the set of looping lemmas, so that users see right away where they have to look. Maybe suggest using rw instead for better control.

Of course not all kinds of simp loops can be prevented, but many common and “obvious” ones can.

Proposal

My idea for how to address this is based on the observation that a good™ simp lemma likely has it's RHS in simp-normal form, because what’s the point in not. But in the two examples above, the bad™ rules do not have their RHS in simp-normal form!

So the idea is that simp would, before applying a lemma ∀ x, lhs[x] = rhs[x] to an occurrence lhs[e], first simplify rhs[x] in the abstract, i.e. before instantiating x, to rhs'[x], and then continue with rhs'[e].

If simplifying the abstract rhs[x] already fails (by hitting the recursion limit), we conclude that this is not a good rule for simp to apply, report it, and ignore it from now on.

Even better: We don’t even have to hit the recursion limit. As we simplify the rule’s RHS with other rules we do the same thing; simplify these rules’s RHS first. If we keep track of the stack of rules which we are currently simplifying we can abort as soon as we look at a rule that we looked at before, as we just ran into a circle, and can very precisely report the set of looping rules.

Notes

This subsumes the (simpler) check “does this rule apply in it's own RHS“, which would catch the above Tree.size issue, but wouldn’t be good enough in the presence of mutual recursion.
I’d apply this check lazily, i.e. only when just about to apply a simp rule, to avoid unnecessary work and to treat the simp arguments similar to the theorems added with the simp attribute. So simp [looping_theorem] would not complain until looping_theorem would actually be applied. One could also consider checking all the explicit simp arguments.
Sometimes such a bad rule could still be usefully applied to concrete terms.

Consider the rewrite system if b then a else b = if foo b then a else b, foo true = false and if false then a else b = b. The first rule is looping and would be reported, even though (with buttom-up simp applications), and can actually resolve if true then a else b.

I’m not sure how often of that appears in practice, and whether it even should be supported.
If we do want this, then some override syntax (simp [!foo] or so) that disables the check might be useful.
It may be helpful to run this check also eagerly upon attribute [simp] foo, with regard to the current default simp set, to give an early warning.

But this check should not replace the check in the actual simp run, as then we likely have a different simp set.
If we have rules whose RHS is expensive to simplify (large, or very far from a normal form), then it may be useful to cache that, so that the rule can be applied many times without paying that cost again. In some obscure cases this might even speed up the simplifier compared to now.

(Or maybe the existing simp cache will reliably remember rhs[x] → rhs'[x] and provide this caching just fine?)
This is not expected to catch all kind of simp loops, but the remaining simp loops are then due to concrete terms, not due to rewrite rule sets that are inherently looping.
If the user writes simp [foo], then this is (almost) like simp [foo.eq_1, foo.eq_2, …]. Not sure what to say here when foo.eq_1 is good and foo.eq_2 loops; this may need some careful error message wording.

Community Feedback

(None yet)

Impact

Add :+1: to issues you consider important. If others benefit from the changes in this proposal being added, please ask them to add :+1: to it.

leanprover / lean4