Exact accuracies on miniF2F be presented more clearly?

brando90 commented 2 years ago

autoform: https://arxiv.org/abs/2205.12615 Our methodology results in a new state-of-the-art result on the MiniF2F theorem proving benchmark, improving the proof rate from 29.6% to 35.2%.
Guiding an automated prover with these sketches enhances its performance from 20.9% to 39.3% on a collection of mathematical competition problems.

DyeKuu commented 2 years ago

Hi Brando! By exact accuracy you means the accuracy breaking down to each statement, or the sota accuracy like the paper you mentioned here?

brando90 commented 2 years ago

yes, like the examples ones I provided. Let me know if you have other thoughts. Thanks!

Brando Miranda Ph.D. Student Computer Science, Stanford University EDGE Scholar, Stanford University @.**@.> website: https://brando90.github.io/brandomiranda/home.html

On Nov 1, 2022, at 6:01 PM, Kunhao ZHENG @.**@.>> wrote:

Hi Brando! By exact accuracy you means the accuracy breaking down to each statement, or the sota accuracy?

— Reply to this email directly, view it on GitHubhttps://github.com/openai/miniF2F/issues/123#issuecomment-1299410674, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAOE6LXIQKDB2QX2WURWRWLWGG4P5ANCNFSM6AAAAAARTXNYWE. You are receiving this because you authored the thread.Message ID: @.***>

DyeKuu commented 2 years ago

I can provide several paper that I know reporting accuracies on miniF2F, more or less in chronological order. The list may be incomplete and any fix welcome!

minif2f: https://arxiv.org/abs/2109.00110. GPT-f reporting 1.6% passrate on Metamath test split, 29.2% on Lean test split.
GPT-f + curriculum: https://arxiv.org/abs/2202.01344. Additional statements serving as curriculum [statement_curriculum_learning] (https://github.com/openai/miniF2F/tree/statement_curriculum_learning/lean/src/statement_curriculum_learning), using expert iteration(ExIt) reporting 36.6% pass rate on Lean test split.
Thor: https://arxiv.org/abs/2205.10893. Sledgehammer + LLM, reporting 29.9% pass rate on Isabelle test split. They also report the methodology of PACT and ExIt on Isabelle test split.
Hyper Tree Proof Search: https://arxiv.org/abs/2205.11491. Using online training (a search strategy inspired from MCTS), reporting passrate 41% on Lean test split.
https://arxiv.org/abs/2205.12615 / http://aitp-conference.org/2022/abstract/AITP_2022_paper_33.pdf. The autoform paper you mentioned above. Reporting 35.2% passrate on Isabelle split.
Draft, Sketch, Proof: https://arxiv.org/abs/2210.12283. The 2nd paper you mentioned above. Reporting 39.3% passrate on Isabelle test split.

As the accuracies (pass-rate) are usually subject to the computation budget and the language. I only put the number on test split here, for the number of validation split it worth taking a look at the details in these paper.

openai / miniF2F

Exact accuracies on miniF2F be presented more clearly? #123