leanprover / lean4

Lean 4 programming language and theorem prover
https://lean-lang.org
Apache License 2.0
4.63k stars 414 forks source link

RFC: avoid using `Classical.em` to split ifs #2414

Closed fgdorais closed 4 months ago

fgdorais commented 1 year ago

Description

The fact that Lean uses classical axioms on occasion to avoid fiddling with decidabilty instances is well-known and mostly harmless. However, there are some cases where this is not necessary, such as with the split tactic because the decidability instance is provided with the ite/dite expression and could be used to avoid using Classical.em.

Steps to Reproduce

theorem ifthisthenthat (n : Nat) : (if n = 0 then 0 else 0) = 0 := by split <;> rfl

#print axioms ifthisthenthat

Expected behavior: 'ifthisthenthat' depends on no axioms

Actual behavior: 'ifthisthenthat' depends on axioms: [Classical.choice, Quot.sound, propext]

Versions

Lean (version 4.0.0-nightly-2023-07-24, commit 3b6bc4a87df3, Release)

Additional Information

Tentative patch at PR #2412

digama0 commented 1 year ago

People probably already know from my post on #2412, but I very much support fixing this issue. There is a fair amount of lean 3 / mathlib code which worked without classical axioms, and one of my published papers actually includes a theorem (from mathlib) which is claimed to be proved without choice, but this whole development took a major setback in the move to lean 4 because choice is used superfluously in a few critical dependencies.

Std plans to maintain the choice-freedom of most of the decidable parts of the theory, and I deployed some tools to help track regressions, but I quickly discovered that split was producing classical dependencies unnecessarily. I got as far as independently writing the fix from #2412 and confirming that it improved the situation significantly, but I did not attempt to submit the patch at the time. Now @fgdorais has written it independently, and I think he has done a good job of it.

I do not think that tracking classical axioms should be a major focus of lean 4. Most users do not need to care about this. However, it should be a (lightly) supported configuration, and I am willing to pitch in here and there to address issues, especially as they are likely to show up transitively in the std regression tests for this.

leodemoura commented 1 year ago

Here are some questions we expect every RFC should respond from now on.

User Experience: How does this feature improve the user experience?

Beneficiaries: Which Lean users and projects do benefit most from this feature/change?

Community Feedback: Have you sought feedback or insights from other Lean users?

Maintainability: Will this change streamline code maintenance or simplify its structure?

I will include them in the contribution guidelines.

leodemoura commented 1 year ago

one of my published papers actually includes a theorem (from mathlib) which is claimed to be proved without choice, but this whole development took a major setback in the move to lean 4.

I am assuming paper is using Lean 3, and the claim is still true. Why depending on choice is so bad? What the new dependency is preventing you from doing? Could you explain the project? How impactful is it?

fgdorais commented 1 year ago

Dear Leo,

As the author of both the contentious PR and this RFC, I think I should say a few words about my perspective on this.

First and foremost, I sincerely apologize for not following guidelines for the PR. This was a mistake on my part but it was not done with harmful intent. I honestly thought at the time that I was submitting the PR in a black hole, so it didn't make sense to go through formalities. I just wanted to say: "Hey! It's not complicated to fix this!"

Second, I think your RFC questions are very good. I fully intend to get back to these soon.

Third, and most importantly, is that I want to answer your question: "Why depending on choice is so bad?" Let me preface this answer with a disclaimer: I am a logician. I do work in and enjoy foundations of mathematics including constructive logic from time to time. I do have deep knowledge of these areas. That said, I am not a constructivist and I do not think the axiom of choice is bad in any way for mathematical purposes. However, I do understand how classical reasoning can sometimes be harmful for reasoning about programs.

The main issue with split using Classical.em is that it's bad code. It opens a door that could theoretically be exploited by a malicious agent. Remember that Classical.em (Classical.choice, which is used to derive Classical.em, is even worse) can easily be used to prove that something non-computable is actually total. Since Lean relies on such tactics to prove termination there is always a chance for a bad leak to occur. To the best of my knowledge, it is not currently possible to use split to prove something that couldn't be proved without Classical.em, so this is harmless and I am not very concerned about this eventuality. But Lean keeps evolving and this is a potential sinkhole. The best way to avoid this is to avoid using Classical.em in Lean core and Std code so that bad code can only come from other parties.

Lean has a small kernel which is hard to dupe but it's a bad idea to only use that as a defense against malicious agents. I am not malicious enough to bring an example how to abuse split and other tactics but I hope you understand my concern.

Best wishes, François G. Dorais

leodemoura commented 1 year ago

Dear François,

You have to be more concrete. You mention a hypothetical exploit, but then say you are not malicious enough to bring an example.

I do understand how classical reasoning can sometimes be harmful for reasoning about programs.

Where is the evidence/data to support this claim? Isabelle/HOL is based on classical reasoning and has a very successful track record on software verification (e.g., seL4 project.)

Note that even if you are correct, your PR does not fully solve the problem: If the decidability instance has loose bound variables you still fallback to classical. If we merge this PR, I can see users asking in the future: why do split sometimes use Classical.em, and we (the Lean developers) are the ones that have to address this question.

We are always listening to our users. We have talked to many mathematicians and computer scientists, and the vast majority don't care about constructivism. I have asked about high impact projects that are currently blocked unnecessary uses of Classical.em, and never got a convincing answer :)

Best, Leo

fgdorais commented 1 year ago

Dear Leo,

I will not purposefully write code with the intent to break Lean. That's obviously pointless and I don't have that kind of time! (Even if I did, I wouldn't post it here, I would privately email you and ask what to do next.)

Note that I was specifically referring to termination proofs when I said it was bad code. I have no concerns with using classical logic elsewhere. Remember that even the halting problem is provably total using Classical.em!

It's been a while since I actively used Isabelle and HOL (mostly because Lean is so much better!) but I don't recall either one using classical logic for termination proofs. If I recall correctly, that question doesn't actually make sense for one of them and I suspect the other one is like that too but I don't want to make any claim without proof for either one, so please correct me if I'm wrong. In any case, I don't understand why you are referencing these. They're not designed as programming languages in the same way as Lean.

Note that even if you are correct, your PR does not fully solve the problem: If the decidability instance has loose bound variables you still fallback to classical. If we merge this PR, I can see users asking in the future: why do split sometimes use Classical.em, and we (the Lean developers) are the ones that have to address this question.

As you can see from the PR history, this was done after a comment from @digama0 so as to not break current behavior for split. If you think that's wrong, it's easy to fix and revert to the original behavior.

Best wishes, François

leodemoura commented 1 year ago

Dear François,

I will not purposefully write code with the intent to break Lean. That's obviously pointless and I don't have that kind of time! (Even if I did, I wouldn't post it here, I would privately email you and ask what to do next.)

I never claimed you were writing code with the intent to break Lean. Sorry for the misunderstanding. I just pointed out that the Lean developers are the ones that review and maintain external contributions.

Note that I was specifically referring to termination proofs when I said it was bad code.

Same question still applies. Where is the evidence/data to support this claim?

Best, Leo

digama0 commented 1 year ago

Note that even if you are correct, your PR does not fully solve the problem: If the decidability instance has loose bound variables you still fallback to classical. If we merge this PR, I can see users asking in the future: why do split sometimes use Classical.em, and we (the Lean developers) are the ones that have to address this question.

I don't think either I or François have much interest in the case where the decidability instance has bound variables not in the condition itself, as this is very much an edge case which is unlikely to come up in practice. The two reasonable behaviors here are failing and using Classical.em. The latter has the advantage that for people who don't care about avoiding classical logic (which we anticipate to be most users), there is no loss in effectiveness in this case. The former has the advantage that the code is simpler and the algorithm and visible behavior is easier to explain, and also people who want to avoid classical logic in all cases can use split freely without worries. Do you have any particular preferences regarding weighing these two options?

fgdorais commented 1 year ago

Here are my answers to the updated RFC questions itemized by Leo earlier.

It will improve trust. Lean's small kernel is a key feature for trustworthiness. Axioms are necessary (e.g. to support quotient types, another key feature of Lean) but they bypass the kernel. Avoiding axioms when they are not necessary is a good way to ensure trust since axioms are blindly trusted by the kernel.

Although many users would not notice the change (which is good thing) increased trustworthiness would benefit everyone.

Yes. This was discussed here, on Zulip and private channels. The vast majority of community members that expressed opinions are favorable to this change. There are still corner issues but, to the best of my knowledge, these can be resolved at the PR phase.

The tentative PR is relatively simple and designed to avoid disrupting any existing code to the extent possible. It is also designed to facilitate future code to avoid using unnecessary axioms while maintaining the possibility to fall back on classical axioms when necessary. I do not expect any issues beyond normal code maintenance. This can be further analyzed in the PR phase.

leodemoura commented 1 year ago

It will improve trust. Lean's small kernel is a key feature for trustworthiness. Axioms are necessary (e.g. to support quotient types, another key feature of Lean) but they bypass the kernel. Avoiding axioms when they are not necessary is a good way to ensure trust since axioms are blindly trusted by the kernel.

Classical.em is a well studied axiom. There are no trust concerns.

Although many users would not notice the change (which is good thing) increased trustworthiness would benefit everyone.

There are no trust concerns. Moreover, Mathlib uses classical reasoning in many different places anyway.

The vast majority of community members that expressed opinions are favorable to this change.

You need to provide evidence. To be precise, "don't care about this feature" is not support.

Vtec234 commented 1 year ago

You need to provide evidence. To be precise, "don't care about this feature" is not support.

Hi Leo! I would like to express support for adding this feature. I entirely agree that there are no trust concerns regarding Classical.em. My motivation is different, and it is that weaker theories have more models (Neel's remark here). Specifically, a Lean proof that does not depend on Classical.em can be taken to mean other things than just its set-interpretation by virtue of being interpretable in models other than Mario's set-based model. This has practical applications: if a specific model of a subset of Lean's theory is formalized in Lean, such Lean theorems can be viewed as proving something about that model and be made to actually prove that using custom elaborators.

leodemoura commented 1 year ago

@Vtec234 Thanks for sharing your view.

My motivation is different, and it is that weaker theories have more models (Neel's remark here).

While I understand that weaker theories can have more models, my experience has shown that their practical applications have been limited to mostly hypothetical ones. Are there any specific, impactful projects that will benefit directly from this feature? So far, I've come across only hypothetical proposals.

My primary concern is receiving numerous PRs that attempt to avoid classical reasoning in various parts of the system without a compelling rationale. This can easily lead us down a path where contributors may soon try to sidestep constructs like funext in tactics like simp.

You might recall that we experimented with supporting multiple flavors of type theory in the past. Unfortunately, it turned out to be a substantial time sink. Given how valuable time is, it's crucial that we prioritize. We're in a unique position now, with the project receiving ample funding. It's imperative that we allocate these resources judiciously.

Moreover, we are planning to add more automation to Lean in the near future, and we will not waste any resources trying to avoid classical principles.

Vtec234 commented 1 year ago

Are there any specific, impactful projects that will benefit directly from this feature?

I am working on one. But it is very much in progress, and I would entirely understand if the dev team does not want to add features to support a potential future package.

fgdorais commented 1 year ago

Dear Leo,

I do understand how classical reasoning can sometimes be harmful for reasoning about programs.

Where is the evidence/data to support this claim?

This is well-known and I mentioned the canonical example: the solution to the halting problem is provably total using Classical.em but it is provably not computable.

Isabelle/HOL is based on classical reasoning and has a very successful track record on software verification (e.g., seL4 project.)

I was delighted to read about seL4, thank you for mentioning that, it's a really fascinating project!

That said, after reading a bit more, I'm now nearly 100% sure Isabelle (in any flavor) always uses LCF for soundness. So I don't understand what you're trying to say here about classical reasoning.

Best wishes, François

digama0 commented 1 year ago

My primary concern is receiving numerous PRs that attempt to avoid classical reasoning in various parts of the system without a compelling rationale. This can easily lead us down a path where contributors may soon try to sidestep constructs like funext in tactics like simp.

If the proof requires funext, then there isn't any issue with doing so in tactics, it is working as intended. I don't think there is any slippery slope here, there are one or two issues in core tactics which will unlock a huge amount of downstream work. That's all this is. I know you don't consider this work important, but other people do, and I don't see why in an individual RFC like this we need to be considering future RFCs. If a future RFC wants to consider removing funext in simp, let that be considered separately on its own merits.

fgdorais commented 1 year ago

Dear Leo,

You have to be more concrete. You mention a hypothetical exploit, but then say you are not malicious enough to bring an example.

Let me try! There's three scenarios, the "hypothetical exploit" is the third. Retrospectively, I'm realizing the second scenario is the only one I really care about.

Scenario 1. Since Classical.choice, Classical.propDecidable and friends are not computable, it doesn't make sense for a def with executable code to honestly depend on these. If it really did, it wouldn't compile. Yet that happens all the time. This example is due to Jun Yoshida, and it happens even after split is fixed as suggested by this RFC.

def test : List Nat → Nat
| [] => 0
| 0 :: l => test l
| (n+1) :: l => 1 + test (n :: l)

-- 'test' depends on axioms: [Classical.choice, Quot.sound, propext]
#print axioms test

The reason is that this definition uses well-founded recursion, the termination proof depends on theorems from Init.Data.Nat.Linear that use the by_cases tactic. That tactic systematically uses Classical.em but it is not currently part of this RFC to fix this tactic.

I consider this behavior to be harmless. The kernel saw through the spurious dependency, the code got generated and it works exactly as expected.

(Also note that Classical.em is not itself problematic since it lives in Prop. If it showed up as an axiom instead of Classical.choice, I would immediately know it's not relevant.)

Scenario 2. This is a flipped view of part 1. Suppose I write a def but I make a typo, I wanted to use X but I typed Y. They both have the same type, so everything type checks, but Y is noncomputable. At some point, I get the error:

failed to compile definition, consider marking it as 'noncomputable' because it depends on 'Classical.choice', and it does not have executable code

I know my code should compile, so I know I made an error somewhere. I start digging using #print axioms to see where that dependency on Classical.choice comes from. To my dismay, there are many parts that depend on Classical.choice. I will probably eventually find Y but since by_cases, split and simp are common tactics, there's a lot of noise.

At worst, this behavior can be characterized as unhelpful. I think this is sufficient motivation to fix, but I would also consider other options. For example, as a better #print axioms for defs that filters out the noise. That route is arguably better since fixing by_cases, split and simp would definitely cut down the noise, it wouldn't actually resolve the issue of irrelevant dependencies as a whole. That said, I have no idea how to implement such a noise filtering #print axioms.

Scenario 3. Suppose there's a bug in the kernel whereby some noncomputable definition somehow ends up generating bogus code. This is extremely unlikely. I can't imagine how that could happen and I hope we never find out.

Let's say code using such a def lands on my review queue. (Either produced by a malicious agent or as a result of a coding error.) The code compiles fine, it comes with a test suite that (cleverly or mistakingly) avoids the bug. Reading through the code and the tests doesn't reveal any issues, #print axioms shows some weird noncomputable dependencies but I know that happens all the time so I ignore that suspicious data. I rubber stamp the code and move on to the next review item.

This would be terrible! But it is also extremely unlikely. Lean maintainers can't possibly spend much time fixing hypothetical disasters like this. It's much more important to spend time writing quality code which is less likely to have bugs in the first place. I'm sorry I brought that up earlier, I have a bad habit for catastrophizing that I evidently need to keep working on.

Best wishes, François

digama0 commented 1 year ago

Scenario 1. Since Classical.choice, Classical.propDecidable and friends are not computable, it doesn't make sense for a def with executable code to honestly depend on these. If it really did, it wouldn't compile. Yet that happens all the time.

A def can be computable while still depending on classical axioms. The reason is because the axiom can be used inside computationally irrelevant parts of the definition, e.g. proof arguments or type arguments. In particular, any theorem which is proved using choice is computationally irrelevant (that is, "vacuously computable"), and in fact Classical.em is just such a theorem (proved using Classical.choice despite this being noncomputable).

def foo (n : Nat) (_h : n > 0) : Nat := n + 1

def bar : Nat := foo 1 <| by 
  by_cases 2 + 2 = 4 <;> exact Nat.succ_pos _

#eval bar -- 2
#print axioms bar
-- 'bar' depends on axioms: [Classical.choice, Quot.sound, propext]

Scenario 2. This is a flipped view of part 1. Suppose I write a def but I make a typo, I wanted to use X but I typed Y. They both have the same type, so everything type checks, but Y is noncomputable. At some point, I get the error:

failed to compile definition, consider marking it as 'noncomputable' because it depends on 'Classical.choice', and it does not have executable code

I know my code should compile, so I know I made an error somewhere. I start digging using #print axioms to see where that dependency on Classical.choice comes from.

For the same reason as the above, this is the wrong way to diagnose noncomputability issues, because things can be noncomputable without using Classical.choice and vice versa. You should instead look directly at the error message, which points to a noncomputable direct dependency used in the definition in a computationally relevant way. It is difficult to typo this unless you use noncomputable section because each definition individually marks whether it is noncomputable, and the tactics under consideration with spurious usage of classical axioms do so only to prove theorems, so they do not cause any problems for computability.

There is one exception to this, which is that by_cases uses classical logic even if the goal is not a proposition, resulting in a proof approach that fails:

def foo : Nat := by by_cases 2 + 2 = 4 <;> exact 1
-- type mismatch when assigning motive
--   fun t => x✝ = t → Nat
-- has type
--   2 + 2 = 4 ∨ ¬2 + 2 = 4 → Type : Type 1
-- but is expected to have type
--   2 + 2 = 4 ∨ ¬2 + 2 = 4 → Prop : Type

The by_cases patch in std fixes this:

import Std
def foo : Nat := by by_cases 2 + 2 = 4 <;> exact 1 -- ok

#print foo
-- def foo : Nat :=
-- if h : 2 + 2 = 4 then 1 else 1

Scenario 3. Suppose there's a bug in the kernel whereby some noncomputable definition somehow ends up generating bogus code. This is extremely unlikely. I can't imagine how that could happen and I hope we never find out.

I don't really understand this hypothetical. At least right now, noncomputable definitions are exactly those for which the compiler cannot generate a definition for one reason or another. If the compiler is able to generate code, bogus or otherwise, then it will not need to be marked noncomputable, and if you do mark a definition as noncomputable then the compiler doesn't generate code for it at all, bogus or otherwise.

fgdorais commented 1 year ago

Scenario 1. Since Classical.choice, Classical.propDecidable and friends are not computable, it doesn't make sense for a def with executable code to honestly depend on these. If it really did, it wouldn't compile. Yet that happens all the time.

A def can be computable while still depending on classical axioms. The reason is because the axiom can be used inside computationally irrelevant parts of the definition, e.g. proof arguments or type arguments. In particular, any theorem which is proved using choice is computationally irrelevant (that is, "vacuously computable"), and in fact Classical.em is just such a theorem (proved using Classical.choice despite this being noncomputable).

That's pretty much exactly what I'm saying, isn't it? I even made that same remark about Classical.em vs Classical.choice.

Scenario 2. This is a flipped view of part 1. Suppose I write a def but I make a typo, I wanted to use X but I typed Y. They both have the same type, so everything type checks, but Y is noncomputable. At some point, I get the error:

failed to compile definition, consider marking it as 'noncomputable' because it depends on 'Classical.choice', and it does not have executable code

I know my code should compile, so I know I made an error somewhere. I start digging using #print axioms to see where that dependency on Classical.choice comes from.

For the same reason as the above, this is the wrong way to diagnose noncomputability issues, because things can be noncomputable without using Classical.choice and vice versa. You should instead look directly at the error message, which points to a noncomputable direct dependency used in the definition in a computationally relevant way.

It's obviously the wrong way to do this since it doesn't work! The error message for noncomputability is not that specific. The one above was copy pasted from an actual constructed example, there is no location information at all. To be honest, I edited Classical.propDecidable to Classical.choice since there was no reason to bring up Classical.propDecidable. Oh! Is that what you're pointing out? The name info is more precise? I didn't pay attention since I generated the error on purpose.

There is one exception to this, which is that by_cases uses classical logic even if the goal is not a proposition, resulting in a proof approach that fails:

The Std patch to by_cases is good. That's why I didn't consider it important to fix by_cases in this RFC.

Scenario 3. Suppose there's a bug in the kernel whereby some noncomputable definition somehow ends up generating bogus code. This is extremely unlikely. I can't imagine how that could happen and I hope we never find out.

I don't really understand this hypothetical. At least right now, noncomputable definitions are exactly those for which the compiler cannot generate a definition for one reason or another. If the compiler is able to generate code, bogus or otherwise, then it will not need to be marked noncomputable, and if you do mark a definition as noncomputable then the compiler doesn't generate code for it at all, bogus or otherwise.

It's a silly scenario. The disaster is that bogus code ends up being generated in a manner that could be avoided if the #print axioms info was more useful. I like being able to detect bogus code before it goes into production.

digama0 commented 1 year ago

It's a silly scenario. The disaster is that bogus code ends up being generated in a manner that could be avoided if the #print axioms info was more useful. I like being able to detect bogus code before it goes into production.

I'm not seeing how it does that. #print axioms is doing exactly what it is designed to do, and it has very little to do with the code generation process.

I'm not questioning your threat model (you are saying it is unlikely and I agree but let's set that aside), I'm questioning the validity of the example itself (would you in fact get that behavior under that hypothetical) and how #print axioms can do anything about it (what precisely would you expect from #print axioms, if not printing used axioms? And what prevents the exact same situation from arising without axioms?).

fgdorais commented 1 year ago

(Silly fiction deleted. Look at the edit history if you really want to read. Note to self: never post here right before bedtime.)

Regarding the role of axioms. As you pointed out, there are two ways to get a noncomputable definition: explicitly marking it noncomputable even if it is computable, or using a noncomputable definition or an axiom in which case Lean forces you to mark your definition noncomputable. Production code with a noncomputable definition would definitely be a red flag. Axioms are a bit different. They're always noncomputable but they are not marked noncomputable. If an axiom were somehow generating bogus code through a code generation bug, how would I distinguish that from a normal computable definition? #print axioms would still signal something foul since it's not about code generation, but nothing about code generation would reveal something suspicious since everything generates code. I'm probably wrong here but that's my reasoning for Scenario 3.

fgdorais commented 1 year ago

Regarding Scenario 2. I think your right, #print axioms is not the right tool here. That scenario is looking less and less compelling to me since this RFC wouldn't actually fix anything there.

With all three scenarios being non-compelling, I think I'm slowly converging to Leo's view. Not quite though, I still think my PR code is better than the current code since it allows more options without adding much complexity. I don't know how to argue that as a motivation for this RFC.

JamesGallicchio commented 1 year ago

I know I travel in heavily constructivist circles, but when trying to sell Lean to CMU PL people, non-constructivism is brought up relatively frequently. Of course, PL people are not the target audience, and constructivism is not the target logic, but it would be nice to at least accommodate constructivism when it's easy to do so.

My primary concern is receiving numerous PRs that attempt to avoid classical reasoning in various parts of the system without a compelling rationale.

I totally understand this. In particular, having more complex tactic implementations in core means they could be harder to maintain in the future. I'm not very familiar with the code base, but the maintenance burden for this particular RFC seems pretty low, at least based on the implementation in #2412. Maybe that's something to worry about down the road?

JLimperg commented 4 months ago

This issue has been incidentally fixed by #4349: The MWE in the first message is now axiom-free. :tada: