Closed zijunchen68 closed 3 months ago
Hi @zijunchen68 , thank you for pointing this out! The miniF2F dataset LEGO uses can be found in this repository: https://github.com/albertqjiang/miniF2F. This repository is a fork from the original OpenAI miniF2F dataset. In this fork, there are approximately 10 to 15 problems that differ (the exact number escapes my memory), and I'm unsure of the reasons behind these changes (I should ask Albert for clarification). We've been utilizing this fork to conduct experiments with LEGO-Prover since its early stages, and only recently discovered discrepancies between some problems in this fork and the original OpenAI miniF2F dataset. However, LEGO-Prover fails to prove most of these differing problems, suggesting that reverting them to the original dataset may not significantly impact performance.
I have checked the https://github.com/wiio12/LEGO-Prover/tree/master/data/full_data and compare with the origin one https://github.com/openai/miniF2F/tree/main/isabelle/ and find the test split imo_2007_p6 is replaced with numbertheory_sqmod3in01d in valid set, leading to total 487 problems. The accuracy reported in your paper might be wrong?