lean-dojo / LeanDojo

Tool for data extraction and interacting with Lean programmatically.
https://leandojo.org
MIT License
532 stars 77 forks source link

Unexpected result from get_traced_tactics() #71

Closed AG161 closed 10 months ago

AG161 commented 10 months ago

Description I have forked the example repo from the docs to run some tests: https://github.com/AG161/lean4-example In https://github.com/AG161/lean4-example/blob/main/Lean4Example.lean I have the following dummy theorem:


theorem test_have (p q : Prop) : ((p ∧ ¬q) ∨ (q ∧ ¬p)) ↔ ((p ∨ q) ∧ ¬(p ∧ q)) := by
  apply Iff.intro
  · intro h₁
    apply And.intro
    · apply Or.elim h₁
      · exact fun paq => Or.inl paq.1
      · exact fun paq => Or.inr paq.1
    · apply Or.elim h₁
      · exact fun pnq => fun paq => pnq.2 paq.2
      · exact fun qnp => fun paq => qnp.2 paq.1
  · intro h₂
    apply Or.elim h₂.1
    · intro hp
      exact Or.inl (And.intro (hp) (fun hq => h₂.2 (And.intro hp hq)))
    · intro hq
      exact Or.inr (And.intro (hq) (fun hp => h₂.2 (And.intro hp hq)))

When I use get_traced_tactics() on this theorem, the first tactic looks correct:


TracedTactic(tactic=apply Iff.intro, state_before=p q : Prop
⊢ p ∧ ¬q ∨ q ∧ ¬p ↔ (p ∨ q) ∧ ¬(p ∧ q), state_after=case mp
p q : Prop
⊢ p ∧ ¬q ∨ q ∧ ¬p → (p ∨ q) ∧ ¬(p ∧ q)

case mpr
p q : Prop
⊢ (p ∨ q) ∧ ¬(p ∧ q) → p ∧ ¬q ∨ q ∧ ¬p)

However here is the next item in the tactic list:


TracedTactic(tactic=· intro h₁
  apply And.intro
  · apply Or.elim h₁
    · exact fun paq => Or.inl paq.1
    · exact fun paq => Or.inr paq.1
  · apply Or.elim h₁
    · exact fun pnq => fun paq => pnq.2 paq.2
    · exact fun qnp => fun paq => qnp.2 paq.1, state_before=case mp
p q : Prop
⊢ p ∧ ¬q ∨ q ∧ ¬p → (p ∨ q) ∧ ¬(p ∧ q)

case mpr
p q : Prop
⊢ (p ∨ q) ∧ ¬(p ∧ q) → p ∧ ¬q ∨ q ∧ ¬p, state_after=case mpr
p q : Prop
⊢ (p ∨ q) ∧ ¬(p ∧ q) → p ∧ ¬q ∨ q ∧ ¬p)

The results continue to look incorrect from there, and LeanDojo believes there are 25 tactics in the proof. I'm not sure if this is a result of using the dot notation, I haven't used that with LeanDojo before, but I started using it because it is the preferred style in Mathlib.

Detailed Steps to Reproduce the Behavior Run the following python script on LeanDojo 1.2.5:

from lean_dojo import *
repo = LeanGitRepo("https://github.com/AG161/lean4-example", "79580afb095491e82beb7dc22cc3b702e3e3b333")
tr = trace(repo)
thm_list = tr.get_traced_theorems()
for item in thm_list:
    if item.theorem.full_name == 'test_have':
        thm = item
        break
tac = thm.get_traced_tactics()
print(tac[1])

Platform Information

yangky11 commented 10 months ago

This is expected. LeanDojo is not guaranteed to extract a non-overlapping sequence of tactics. When the proof includes things like · and <;>, LeanDojo will also extract blocks of tactics. That said, the 25 tactics in your example should also include all individual tactics?

AG161 commented 10 months ago

Okay I see, sorry about that. I was assuming get_traced_tactics() would return the sequence of tactics you'd have to apply to solve the theorem. I guess I was thinking from the perspective of training a model like ReProver. I imagined one would take the theorems from a train set in Mathlib, trace them, and then form a bunch of state/tactic pairs from the get_traced_tactics() of each theorem. But if that method can return a block of tactics at once, then that could be a problem for training, and the dot notation is pretty common in Mathlib. When you made data for ReProver, did you use a different approach than the one I outlined above, or did you filter those blocks out, or is it fine to just ignore them?

That said, the 25 tactics in your example should also include all individual tactics?

I think so, yes.

yangky11 commented 10 months ago

It is nontrivial to reconstruct a proof from tactics returned by get_traced_tactics (). We didn't do any filtering when training ReProver. It is not clear if this causes any problem, since training ReProver only requires a set of (state, tactic) pairs. Those pairs don't have to be non-overlapping.

AG161 commented 10 months ago

I agree that the overlapping is not an issue, but I think the large blocks might be. In the example I posted, the correct second tactic is intro h₁ (or any other name), but as is, the LLM might be trained to output that very large block corresponding to tac[1]. That's why I think having one tactic application at a time is probably ideal.

In a similar vein, the before/after states for tactics include the entire goal state, rather than the subgoal the tactic is being applied to. This is easy to deal with because the text blocks corresponding to subgoals are separated by blank lines and the tactic is applied to the top one. But I'd guess based on the literature that an LLM which saw only the subgoal text would train better than one that saw the rest of the unrelated text in the goal state, especially if the size of unrelated text was large. I could be wrong but I would assume these two changes would help training.

In any case, I'm happy to close the issue because get_traced_tactics() is working as intended. Thank you by the way for helping me understand the library, I appreciate it.