Open nomeata opened 2 weeks ago
For some cases, the computation can be verified more quickly than it can be done. For example, in Nat.decidableExistsLE, from what I understand the kernel currently checks all values until one works, but optimally the value which satisfies the predicate can be found in native code and then given to the kernel.
Other examples of this phenomenon (although they require more mathematics) is when computing Nat.gcd a b
, native code can compute values x, y, z, w
such that z * (x * a - y * b) = a, w * (x * a - y * b) = b
and then the gcd is x * a - y * b
, or factorization, where native code can factor the number and produce primality certificates
There was also some discussion about this topic in https://leanprover.zulipchat.com/#narrow/stream/113488-general/topic/Large-ish.20computation.20without.20Lean.2EofReduceBool . Is there some way to profile kernel reduction, to understand where the term could be optimized?
Absolutely true that often you can split the computation into unverified data generation and verified checking; that is not the topic of this issue, though.
Better profiling for kernel reduction would be great, and I'm missing it too, but isn't really available right now.
This is a meta-issue that I’ll use to track my progress in creating a toolkit for faster kernel reduction. This is to help me organize the work, but also to provide more visibility to interested members of the community.
Background
Lean has two mechanisms for computations in proof:
by decide
, which usesKernel.whnf
, andby native_decide
, which uses the lean compiler to then run native code. The former tends to be slow, the latter has a significantly larger TCB. I investigated ways to find a middle ground here, something that allows larger computations than what we can handle well inKernel.whnf
, but still a noticably smaller TCB.While a faster
Kernel.whnf
is possible by using a different evaluator (e.g. something like Sestof's abstract machine) and would give a speed up of 2-3×, this does quite justify the increase in kernel code. Other ideas, such a certified compilation to some lower-level VM that is interpreted by the kernel, are appealing, but too large tasks to tackle right away.In these investigations I noticed that the code we send to the kernel is not particularly optimized (recursive functions use complex encodings, numerical operations are overloaded, computations on subtypes like
Fin
perform lot of wrapping and unwrapping). A quick experiment shows that a 10× speed-up is possible in some cases.However just applying this trick broadly can lead to an overall slow-down, because for small calculations, it’s easier to just reduce it, rather than optimizing it first.
Goal
Therefore the goal of this project is to create a toolkit hat can be used by advanced users to carefully optimize a specific expensive proof, or to implement a particular proof-by-reflection tactic.
Design
The idea is to have a
by rsimp_decide
tactic that uses the simplifier (which is a versatile tool for certified rewrites of terms) with a custom simpset (rsimp
) to rewrite the expression that the kernel will evaluate first.Component overview
(These are rough specifications, some syntax or interaction will become clearer as I build these things.)
[ ] A
rsimp
simp set. Initially without any promise of standard library coverage; instead early adoptor are expected to extend it as needed.[ ] A
rsimp_decide
tactic using this simpset[ ] A
rsimp_optimize
attribute or command that takes a definitionf
, takes its RHS, optimizes it, definesf.rsimp_opt
and addsf = f.rsimp_opt
to thersimp
simpset.[ ] The above, when run on a recursive function (more general: any theorem that has the shape of an unfolding theorem), to use
Nat.rec
and huge initial fuel to create a non-recursive definition that reduces efficiently. This will, in particular, unblock the use of well-founded recursion in such applications.[ ] (Maybe) A variant of the above that can take a theorem
@decide (P x) (inst x) = … @decide (P y) (inst y) …
, and produce arsimp
rule@decide (P x) (inst x) = Nat.rec …
, to simpliy recursiveDecidable
instances.[ ] (Maybe) Pursue a
Decidable
refactoring so that@decide P inst
is a plain function onBool
, and may reduce more easily (see #2038)Data structures optimized for in-kernel reduction
Nat → α
maps, optimized for querying, to replace static arraysNat
literals as packed arrays or bitvectors works well (i.e. that their representation in C files is efficient, see https://github.com/leanprover/lean4/issues/5771)Kernel tweaks
Despite the goal of not touching the kernel too much, there might be room for small useful tweaks.
Nat.rec
without consulting the environment gives a worthwhile performance boost, given the above use ofNat.rec
for recursion.strictLet : a → (a → b) → b
which, when reduced by the kernel, causes the first argument to whnf’ed first. Otherwise left-folds will leak memory and stack space. This could also be used to ensure sharing of results.Evaluation
Once things are in place, it would be good to see that it can be applied successfully to
Nat.sqrt
andNat.minFacAux
bv_decide
machinery using this toolkit, and see if we can handle smallish SAT proofs withoutnative_decide
that way.