Open alexkeizer opened 2 months ago
It would be very helpful to get an idea what causes instantiateMVars
to become so slow. For another datapoint, consider https://github.com/leanprover/LNSym/pull/219. This PR replaces a use of evalTactic with a direct implementation using Meta API. Compare a profile from before the change with a profile of the same file afterwards.
You'll notice the former takes 8 seconds in total, of which 1.2 is spent in "Elab.step: tacticInit_next_step_" (init_next_step
is the tactic that was being evalTactic
d). In the second profile, initNextStep (the replacement) doesn't even make the profiler's threshold, but we can see that we spend much less time executing the tactic (the parent "Elab.step: tacticSymn" node was about 3.3 seconds before, and is 1.1 seconds in the second profile; a difference of exactly 1.2 seconds).
However, all of that is lost on the metavariable instantiation. This profile was made with a toolchain version that doesn't have a node for instantiateMVar, but we can deduce how much is spent on it from the self time of Elab.command: this was 3.8 seconds in the first profile, and 11.7 seconds in the second.
Any hints as to what could've caused such a dramatic increase in the time taken to instantiate metavars is appreciated!
Unfortunately, the slowdown seems to not be explained just by proof sizes. In fact, I ran some experiments on https://github.com/leanprover/LNSym/pull/215, which implements a change that causes a pretty big slowdown in the time taken for instantiateMVars
, but when I measured the size of the proofs, those generated after the change seemed to be smaller, despite being slower.
With the caveat that the slowdown was measured by symbolically simulating at least 50 steps of SHA512, but I wasn't able to get numbers for proof sizes for more than 7 steps, so it is possible that somehow for the first 7 steps, the changed version generated a smaller proof, but somehow for 50 steps the proof becomes larger.
I’ve identified the performance issue. The tactic produces a long sequence of nested, delayed-assigned metavariables. I confirmed that the sequence length exceeds 4,000.
Here’s a simplified/abstracted version of what’s happening:
It begins with a term like ?m_1 a_1
, where ?m_1
has a delayed assignment of the form:
?m_1 #[x_1] := ... (?m_2 a_2) ...
where ?m_2
also has a delayed assignment of the form:
?m_2 #[x_2] := ... (?m_3 a_3) ...
and this continues until we reach:
?m_4000 #[x_4000] := ... -- no metavariable here
The current implementation constructs the resulting term from the bottom up, starting with ?m_4000
.
Furthermore, at each step, replace_fvars
is used to replace occurrences of x_i
with a_i
after replacing ?m_i
with its assignment. This process gradually builds a larger and larger term, and as a result, replace_fvars
becomes increasingly expensive.
I tried two different approaches, but neither produced significant performance improvements. I am now considering special treatment for proofs. For example, skipping the beta-reduction step here significantly improves runtime (by about 50%) since it reduces the number of terms containing free variables. However, this is not a robust solution.
I also considered creating auxiliary lemmas, but it's messy. We would need all the machinery for MetaM
, and there's no guarantee that it would solve the performance issue after we add all this complexity. For using MetaM
, we need to avoid loose bound variables.
Another option I am considering is using a delayed substitution auxiliary term within instantiate_mvars
. Before returning the result, we would perform a single traversal to eliminate all them. Not sure it will solve all issues because update the metavar context assignment, and these delayed substitution terms should not leak into the assigment.
Thanks for the investigation @leodemoura!
Where exactly do these delayed mvars get created? I originally thought the various Meta
APIs that we use create regular assignments, but from your diagnosis I assume those can introduce delayed assignments also. Is there a way for us to work around the problem by avoiding such delayed assignments altogether?
Prerequisites
Please put an X between the brackets as you perform the following steps:
[X] Check that your issue is not already filed: https://github.com/leanprover/lean4/issues
[ ] Reduce the issue to a minimal, self-contained, reproducible test case. Avoid dependencies to Mathlib or Batteries.
[ ] Test your test case against the latest nightly release, for example on https://live.lean-lang.org/#project=lean-nightly (You can also use the settings there to switch to “Lean nightly”)
Description
In (LNSym)[github.com/leanprover/LNSym] we produce quite large proofs, which seem to be bottlenecked by metavariable instantiation: in our most extreme benchmark, elaborating a theorem takes about 18 seconds in total, of which 10 seconds are spent in
instantiateMVars
.Firstly, this time was not showing up in the
trace.profiler
output, but this is fixed in #5501.The bigger question, is why is metavariable instantiation so slow? Here's a table of some benchmarks we've run, with links to the source code and the resulting profile.
instantiateMVars
To be clear: the
instantiateMVars
runtime reported in the table measures only the calls done in theMutualDef
elaborator.Context
To investigate the issue, we've experimented with a wrapper that remembers the goal metavariable at the start, runs the tactic, and then at the end reassigns the metavariable to it's instantiation (assuming that it's been assigned) --- see here for where the wrapper is used.
Surprisingly this had almost no effect on the time spent on metavariable instantiation in the elaborator code. The benchmarks table above use the commit which has this wrapper. To see the time taken, expand the tactic nodes until you see
Tactic.sym: withInstantiatMainGoal.instantiateMVar
and observe that even in the largest benchmark, instantiating mvars here takes only 5ms, as opposed to the 10 seconds spent in the elaborator.Also, to be clear: these profiles were obtained by running Lean from the terminal, but running them in interactively in Lean gives roughly the same timings: this issue seems orthogonal from what's going on in #5614.
Steps to Reproduce
8384b193b4c727da83b200f5132c7b4620d3b03d
to get the experimental wrappermake profiles
to profile the benchmarksjson
files underdata/profiles
If you're not running a toolchain with #5501 cherry-picked on it, you can deduce how much time is spent on instantiating mvars by looking at the self-time of the
Elab.command
node.Expected behavior: I wouldn't expect the majority of elaboration time to be spent on instantiating mvars.
Actual behavior: The time spent on instantiating mvars seems to scale non-linearly with the size of the proof (or at least, it scales non-linearly with the time spent on evaluating the tactic that produced the proof), even if we force the tactic to fully instantiate it's original main goal.
Versions
nightly-2024-09-10, with #5501 cherry-picked on it (specifically,
opencompl/fix-timeleak-nightly-2024-09-10
)Additional Information
[Additional information, configuration or data that might be necessary to reproduce the issue]
Impact
Add :+1: to issues you consider important. If others are impacted by this issue, please ask them to add :+1: to it.