trishullab / PutnamBench

An evaluation benchmark for undergraduate competition math in Lean4, Isabelle, Coq, and natural language.
56 stars 8 forks source link

unsound usage of optParams in Lean problems #166

Closed dwrensha closed 3 months ago

dwrensha commented 3 months ago

putnam_2008_b5 and many other problems specify predicates as default values for optional parameters. It seems that this was done to avoid the need to add extra top-level definitions. Unfortunately, it leads to false theorems, because within the theorem body, we are not allowed to assume that the optional parameter always has its default value.

To show that putnam_2008_b5 is false as currently stated, here is an example of how to use it to prove False:

import Mathlib
open BigOperators

open Filter Topology Set Nat

abbrev putnam_2008_b5_solution : Set (ℝ → ℝ) := {fun (x : ℝ) => x + n | n : ℤ} ∪ {fun (x : ℝ) => -x + n | n : ℤ}

theorem putnam_2008_b5
(fqsat : (ℝ → ℝ) → ℚ → Prop :=
  fun f q => ContDiff ℝ 1 f ∧ (∃ p : ℚ, p = f q ∧ p.den = q.den))
(fsat : (ℝ → ℝ) → Prop := fun f => ∀ q : ℚ, fqsat f q)
: ∀ f : (ℝ → ℝ), fsat f ↔ f ∈ putnam_2008_b5_solution := by
  sorry

theorem explode : False := by
  have h1 :=
    putnam_2008_b5 (fun _ _ ↦ False) (fun _ ↦ False) (fun x ↦ x + 1)
  rw [false_iff] at h1
  contrapose! h1
  simp
  left
  use 1
  norm_cast
GeorgeTsoukalas commented 3 months ago

Thanks for the feedback! Most of the team is travelling (mostly at ICML) currently but will begin resolving this when we return.

GeorgeTsoukalas commented 3 months ago

Now having looked at the issue I can say that we've had a misconception of the semantics of (x : α := a). Thanks for pointing it out! It shouldn't be too difficult to modify this. Should we modify such definitions where applicable to be top-level, like:

def fqsat (f : ℝ → ℝ) (q : ℚ) : Prop := ContDiff ℝ 1 f ∧ ∃ p : ℚ, p = f q ∧ p.den = q.den
theorem putnam_2008_b5'
: ∀ f : (ℝ → ℝ), (∀ q : ℚ, fqsat f q) ↔ f ∈ putnam_2008_b5_solution :=
sorry

Or maybe

theorem putnam_2008_b5''
(fqsat : (ℝ → ℝ) → ℚ → Prop)
(hfqsat : ∀ f q, fqsat f q ↔ ContDiff ℝ 1 f ∧ (∃ p : ℚ, p = f q ∧ p.den = q.den))
: ∀ f : (ℝ → ℝ), (∀ q : ℚ, fqsat f q) ↔ f ∈ putnam_2008_b5_solution :=
sorry

is a better choice? It may be preferable to keep things contained within the theorem statement as existing ITP interfaces may not handle context outside theorem statement off-the-shelf (but this doesn't necessarily have to be catered to).

dwrensha commented 3 months ago

In my opinion, adding top-level definitions is more idiomatic. However, adding extra hypotheses is what AlphaProof does: https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/imo-2024-solutions/P6/index.html

theorem imo_2024_p6
    (IsAquaesulian : (ℚ → ℚ) → Prop)
    (IsAquaesulian_def : ∀ f, IsAquaesulian f ↔
      ∀ x y, f (x + f y) = f x + y ∨ f (f x + y) = x + f y) :
    IsLeast {(c : ℤ) | ∀ f, IsAquaesulian f → {(f r + f (-r)) | (r : ℚ)}.Finite ∧
      {(f r + f (-r)) | (r : ℚ)}.ncard ≤ c} 2 := by ...
GeorgeTsoukalas commented 3 months ago

I think we'll opt to follow the AlphaProof convention from that problem for now. It is probably a bit better for existing baselines to work without additional tweaks at the moment, though the formalizations will be less idiomatic. I'm open to changing it in the future though, do let me know if you have any other suggestions for the benchmark. Thanks for pointing this issue out! We'll eventually split the benchmark into (train-valid)/test and would be happy to hear your thoughts on a good way to split it up, especially if you were planning to add proofs to any of these problems to Compfiles.

Will keep the issue open until we have resolved all the affected problems.

GeorgeTsoukalas commented 3 months ago

All modifications should appear in #193