trishullab / PutnamBench

An evaluation benchmark for undergraduate competition math in Lean4, Isabelle, Coq, and natural language.
56 stars 8 forks source link

Evaluation of Formal Proofs #168

Closed sivakishoresk closed 3 months ago

sivakishoresk commented 3 months ago

Many of the theorem statements does not have formal proofs(they have the sorry tags). If I have a generated proof for a theorem how should I evaluate it, even if we have a predicted proof and ground truth proof, how do I go for semantic evaluation between these two. Someone please help me with this.

GeorgeTsoukalas commented 3 months ago

Hi,

The nice feature about the statements being written in interactive theorem provers like Lean, Isabelle, and Coq is that you do not need to compare to a ground truth proof. Semantic evaluation is also unnecessary, all you need to know is whether the proof you generated compiles without error in the language. I would recommend looking through the following Lean introduction: https://leanprover-community.github.io/mathematics_in_lean/. This should give a sense of how it works.

There are tools to interact with Lean such as https://leandojo.org/ and https://github.com/trishullab/copra.