Timeout with large arithmetic expression involving numerals

TwoFX commented 3 months ago

Prerequisites

Please put an X between the brackets as you perform the following steps:

[x] Check that your issue is not already filed: https://github.com/leanprover/lean4/issues
[x] Reduce the issue to a minimal, self-contained, reproducible test case. Avoid dependencies to Mathlib or Batteries.
[x] Test your test case against the latest nightly release, for example on https://live.lean-lang.org/#project=lean-nightly (You can also use the settings there to switch to “Lean nightly”)

Description

The following code timeouts with the default maxHeartbeats setting. It sometimes fails at whnf and sometimes at isDefEq, depending on the specific heartbeat setting used.

theorem foo (x y z p q : Int) : False :=
  have : (1 * x ^ 1 + z ^ 1 * p) *
        (1 / 1 * p ^ 1 * x ^ 1 + 1 * q * p ^ 1 * x * z + 1 * q ^ 1 * p ^ 1 * x ^ 1 +
                            1 * q ^ 1 * p ^ 1 * x * z -
                          1 * q * p ^ 1 * y ^ 1 +
                        1 * q ^ 1 * p ^ 1 * x ^ 1 +
                      1 * q ^ 1 * p ^ 1 * x * z -
                    1 * q ^ 1 * p ^ 1 * y ^ 1 +
                  1 * q ^ 1 * p ^ 1 * x ^ 1 +
                1 * q ^ 1 * p ^ 1 * x * z -
              1 * q ^ 1 * p ^ 1 * y ^ 1 +
            1 * q ^ 1 * x ^ 1 -
          1 * q ^ 1 * p * y ^ 1) +
      z * (1 * y) *
        (-(1 / 1 * p ^ 1 * x * y) + 1 * q * p ^ 1 * z * y - 1 * q ^ 1 * p ^ 1 * x * y +
                  1 * q ^ 1 * p ^ 1 * z * y -
                1 * q ^ 1 * p ^ 1 * x * y +
              1 * q ^ 1 * p ^ 1 * z * y -
            1 * q ^ 1 * p ^ 1 * x * y +
          1 / 1 * q ^ 1 * p ^ 1 * z * y) +
    (-y ^ 1 + p * x * (1 * z) + q * (1 * z ^ 1)) *
      (-(1 / 1 * p ^ 1 * x * z) - 1 * q * p ^ 1 * x ^ 1 - 1 * q ^ 1 * p ^ 1 * x * z -
                1 * q ^ 1 * p ^ 1 * x ^ 1 -
              1 * q ^ 1 * p ^ 1 * x * z -
            1 * q ^ 1 * p ^ 1 * x ^ 1 -
          1 * q ^ 1 * p ^ 1 * x * z -
        1 * q ^ 1 * p * x ^ 1) =
  1 *
        (1 / 1 * p ^ 1 * x ^ 1 + 1 * q * p ^ 1 * x * z + 1 * q ^ 1 * p ^ 1 * x ^ 1 +
                            1 * q ^ 1 * p ^ 1 * x * z -
                          1 * q * p ^ 1 * y ^ 1 +
                        1 * q ^ 1 * p ^ 1 * x ^ 1 +
                      1 * q ^ 1 * p ^ 1 * x * z -
                    1 * q ^ 1 * p ^ 1 * y ^ 1 +
                  1 * q ^ 1 * p ^ 1 * x ^ 1 +
                1 * q ^ 1 * p ^ 1 * x * z -
              1 * q ^ 1 * p ^ 1 * y ^ 1 +
            1 * q ^ 1 * x ^ 1 -
          1 * q ^ 1 * p * y ^ 1) +
      1 *
        (-(1 / 1 * p ^ 1 * x * y) + 1 * q * p ^ 1 * z * y - 1 * q ^ 1 * p ^ 1 * x * y +
                  1 * q ^ 1 * p ^ 1 * z * y -
                1 * q ^ 1 * p ^ 1 * x * y +
              1 * q ^ 1 * p ^ 1 * z * y -
            1 * q ^ 1 * p ^ 1 * x * y +
          1 / 1 * q ^ 1 * p ^ 1 * z * y) +
    1 *
      (-(1 / 1 * p ^ 1 * x * z) - 1 * q * p ^ 1 * x ^ 1 - 1 * q ^ 1 * p ^ 1 * x * z -
                1 * q ^ 1 * p ^ 1 * x ^ 1 -
              1 * q ^ 1 * p ^ 1 * x * z -
            1 * q ^ 1 * p ^ 1 * x ^ 1 -
          1 * q ^ 1 * p ^ 1 * x * z -
        1 * q ^ 1 * p * x ^ 1) := sorry
  sorry

The problem goes away when introducing a new parameter w : Nat and replacing all 1 by w.

set_option diagnostics true shows some interesting type class numbers:

[type_class] used instances (max: 42539, num: 13): ▼
  USize.instOfNat ↦ 42539
  Int.instHPowNat ↦ 42145
  Lean.Omega.IntList.instHMulInt ↦ 21357
  instHMul ↦ 11896
  instPowNat ↦ 8621
  instHSubPos ↦ 5140
  instHAddPosChar ↦ 4084
  Lean.Omega.IntList.instNeg ↦ 845
  instHAdd ↦ 440
  instHSub ↦ 189
  Int.instMul ↦ 90
  Int.instAdd ↦ 31
  Int.instSub ↦ 29

Context

The linear_combination tactic in mathlib is suffering from performance issues, and handling terms that look like the example above seems to be part of the reason why it is so slow.

Steps to Reproduce

Copy the above code into the Lean web editor.

Expected behavior: Elaboration should be very fast

Actual behavior: Timeouts

Versions

4.11.0-nightly-2024-07-27 on live.lean-lang.org

Impact

Add :+1: to issues you consider important. If others are impacted by this issue, please ask them to add :+1: to it.

kokic commented 3 months ago

Yes, a simple way to avoid timeout is to split the expression into multiple functions or definitions.

leodemoura commented 2 months ago

@TwoFX @Kha The PR above has an analysis for this issue. With the PR, the example does not time out anymore, but is still expensive. Simple workaround for much better performance: write x ^ (1 : Nat) instead of x ^ 1. The PR also makes this suggestion.

kmill commented 2 months ago

For a performance measurement, the example spends about 3% of the elaboration time inside the binop elaborator, and then I am guessing the rest of the time it is in the synthetic metavariable synthesis loop.

I thought about using withSynthesize at various points in the binop elaborator to eagerly use default instances, but default instances need to wait for additional type information in reasonable cases. For example, in the following you don't know that 2 : Float until after xs is elaborated, using the default instance coming from the HomogeneousPow Float instance. If we attempted default instances earlier, it would commit to 2 : Nat, leading to failure.

variable (xs : Array Float)
#check Array.map (fun x => x ^ 2) xs

One thing we could do to dramatically reduce the number of instance problems in this example is to re-use the elaborated functions when the types of both operands are known in toExprCore, using a cache keyed by operand types. This should be easy to implement and not change elaboration except for performance. I'll investigate.

hrmacbeth commented 2 months ago

Thanks for investigating this! Just a note that in the terms produced by linear_combination, the variables might come from any type equipped with a CommRing instance, not just the integers.

(Not sure which of the fixes being investigated still work in this setting.)

kmill commented 2 months ago

In Heather's linear_combination examples, slow exponentiation dominates. Over at https://github.com/leanprover-community/mathlib4/pull/15599 I made a version of the tactic that fully elaborates exponents before passing expressions to the elaborator, which made it take 8% the time it used to.

We can't do this in general because we want to support HomogenousPow for Float applications. I think we could measure the performance impact of having binop fully elaborate exponents for mathlib, to see if it's worth thinking about redesigning how NatPow and HomogeneousPow work.

leanprover / lean4