leanprover / lean4

Lean 4 programming language and theorem prover
https://lean-lang.org
Apache License 2.0
4.68k stars 421 forks source link

Optimizing terms for kernel reduction (meta issue) #5806

Open nomeata opened 2 weeks ago

nomeata commented 2 weeks ago

This is a meta-issue that I’ll use to track my progress in creating a toolkit for faster kernel reduction. This is to help me organize the work, but also to provide more visibility to interested members of the community.

Background

Lean has two mechanisms for computations in proof: by decide, which uses Kernel.whnf, and by native_decide, which uses the lean compiler to then run native code. The former tends to be slow, the latter has a significantly larger TCB. I investigated ways to find a middle ground here, something that allows larger computations than what we can handle well in Kernel.whnf, but still a noticably smaller TCB.

While a faster Kernel.whnf is possible by using a different evaluator (e.g. something like Sestof's abstract machine) and would give a speed up of 2-3×, this does quite justify the increase in kernel code. Other ideas, such a certified compilation to some lower-level VM that is interpreted by the kernel, are appealing, but too large tasks to tackle right away.

In these investigations I noticed that the code we send to the kernel is not particularly optimized (recursive functions use complex encodings, numerical operations are overloaded, computations on subtypes like Fin perform lot of wrapping and unwrapping). A quick experiment shows that a 10× speed-up is possible in some cases.

However just applying this trick broadly can lead to an overall slow-down, because for small calculations, it’s easier to just reduce it, rather than optimizing it first.

Goal

Therefore the goal of this project is to create a toolkit hat can be used by advanced users to carefully optimize a specific expensive proof, or to implement a particular proof-by-reflection tactic.

Design

The idea is to have a by rsimp_decide tactic that uses the simplifier (which is a versatile tool for certified rewrites of terms) with a custom simpset (rsimp) to rewrite the expression that the kernel will evaluate first.

Component overview

(These are rough specifications, some syntax or interaction will become clearer as I build these things.)

Evaluation

Once things are in place, it would be good to see that it can be applied successfully to

Command-Master commented 6 days ago

For some cases, the computation can be verified more quickly than it can be done. For example, in Nat.decidableExistsLE, from what I understand the kernel currently checks all values until one works, but optimally the value which satisfies the predicate can be found in native code and then given to the kernel. Other examples of this phenomenon (although they require more mathematics) is when computing Nat.gcd a b, native code can compute values x, y, z, w such that z * (x * a - y * b) = a, w * (x * a - y * b) = b and then the gcd is x * a - y * b, or factorization, where native code can factor the number and produce primality certificates

Command-Master commented 6 days ago

There was also some discussion about this topic in https://leanprover.zulipchat.com/#narrow/stream/113488-general/topic/Large-ish.20computation.20without.20Lean.2EofReduceBool . Is there some way to profile kernel reduction, to understand where the term could be optimized?

nomeata commented 5 days ago

Absolutely true that often you can split the computation into unverified data generation and verified checking; that is not the topic of this issue, though.

Better profiling for kernel reduction would be great, and I'm missing it too, but isn't really available right now.