namin / llm-verified-with-monte-carlo-tree-search

LLM verified with Monte Carlo Tree Search
https://arxiv.org/abs/2402.08147
MIT License
240 stars 27 forks source link

Bug in Lean checker #13

Closed namin closed 9 months ago

namin commented 10 months ago

The Lean checker fails on the message below. @johnjyang can take a look, please?

msg = '''### Spec: In Lean4, write a factorial function and prove that the factorial is always strictly positive.
### Lean4:
```lean
import Mathlib

def factorial (n: Nat) : Nat :=
  if n = 0 then
    1
  else
    n * (factorial n-1)

theorem factorialPositive (n : Nat) : factorial n > 0 :=
  Nat.zeroLtOne.lt_of_lt_add $ by
    induction n with
    | zero => simp [factorial]
    | succ n h => simp [factorial, h]

#eval factorial 5

In this Lean4 code, we have defined a factorial function that computes the factorial of a given natural number n. The function is defined recursively: the factorial of 0 is 1, and the factorial of a positive integer n is n times the factorial of n-1.

We then prove that the factorial of any natural ''' ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... >>> import lean

lean.calculateScoreHelper(msg) { "cmd" : "import Mathlib\n\ndef factorial (n: Nat) : Nat :=\n if n = 0 then\n 1\n else\n n * (factorial n-1)\n\ntheorem factorialPositive (n : Nat) : factorial n > 0 :=\n Nat.zeroLtOne.lt_of_lt_add $ by\n induction n with\n | zero => simp [factorial]\n | succ n h => simp [factorial, h]\n\n#eval factorial 5" }

Traceback (most recent call last): File "", line 1, in File "/home/namin/llm-verified-with-monte-carlo-tree-search/lean.py", line 39, in calculateScoreHelper r = checkLean(v) File "/home/namin/llm-verified-with-monte-carlo-tree-search/lean.py", line 79, in checkLean out = proofsearch.run_code(lean_code_block.strip(), verbose=True) File "/home/namin/llm-verified-with-monte-carlo-tree-search/pySagredo/proofsearch.py", line 37, in run_code index = self.proc.expect('env": \d+}', timeout=20) File "/home/namin/mambaforge/envs/trl/lib/python3.10/site-packages/pexpect/spawnbase.py", line 343, in expect return self.expect_list(compiled_pattern_list, File "/home/namin/mambaforge/envs/trl/lib/python3.10/site-packages/pexpect/spawnbase.py", line 372, in expect_list return exp.expect_loop(timeout) File "/home/namin/mambaforge/envs/trl/lib/python3.10/site-packages/pexpect/expect.py", line 179, in expect_loop return self.eof(e) File "/home/namin/mambaforge/envs/trl/lib/python3.10/site-packages/pexpect/expect.py", line 122, in eof raise exc pexpect.exceptions.EOF: End Of File (EOF). Exception style platform. <pexpect.pty_spawn.spawn object at 0x7fe3878f10f0> command: /home/namin/.elan/bin/lake args: [b'/home/namin/.elan/bin/lake', b'env', b'lean', b'--run', b'REPL/Main.lean'] buffer (last 100 chars): '' before (last 100 chars): 'l\r\n#11866 factorial\r\n#11867 factorial\r\n#11868 _eval._closed_1\r\n#11869 _eval\r\n#11870 _eval._boxed\r\n\r\n' after: <class 'pexpect.exceptions.EOF'> match: None match_index: None exitstatus: None flag_eof: True pid: 1198115 child_fd: 5 closed: False timeout: 30 delimiter: <class 'pexpect.exceptions.EOF'> logfile: None logfile_read: None logfile_send: None maxread: 2000 ignorecase: False searchwindowsize: None delaybeforesend: 0.05 delayafterclose: 0.1 delayafterterminate: 0.1 searcher: searcher_re: 0: re.compile('env": \d+\}')

namin commented 10 months ago

The file crashes the Lean server because it has an infinite loop and an eval. What to do?

namin commented 10 months ago

I think for now, I will skip evaluations.

namin commented 10 months ago

also see: https://github.com/namin/llm-verified-with-monte-carlo-tree-search/commit/cf36cffad085bbdd64aee1a1fcdffef876d4892e

hsz0403 commented 3 months ago

have you figure it out? I run lean.py it returns: { "cmd" : "import Mathlib\n\ndef factorial : Nat → Nat\n| 0 => 1\n| n+1 => (n+1) * factorial n\n\ntheorem factorial_pos : ∀ n : Nat, 0 < factorial n\n| 0 => Nat.zero_lt_one\n| n+1 => Nat.mul_pos (Nat.succ_pos n) (factorial_pos n)\n\n#eval factorial 5" } (-1.0, '')