Evaluation of Formal Proofs

trishullab / PutnamBench

An evaluation benchmark for undergraduate competition math in Lean4, Isabelle, Coq, and natural language.

56 stars 8 forks source link

Hi,

The nice feature about the statements being written in interactive theorem provers like Lean, Isabelle, and Coq is that you do not need to compare to a ground truth proof. Semantic evaluation is also unnecessary, all you need to know is whether the proof you generated compiles without error in the language. I would recommend looking through the following Lean introduction: https://leanprover-community.github.io/mathematics_in_lean/. This should give a sense of how it works.

There are tools to interact with Lean such as https://leandojo.org/ and https://github.com/trishullab/copra.

trishullab / PutnamBench

Evaluation of Formal Proofs #168