trishullab / PutnamBench

An evaluation benchmark for undergraduate competition math in Lean4, Isabelle, Coq, and natural language.
41 stars 5 forks source link

Feature request: include the original problem descriptions in docstrings for the theorem statements #196

Open eric-wieser opened 1 month ago

eric-wieser commented 1 month ago

It's quite hard to spot misformalizations when reading the formalization alone. It would be great if the $$\LaTeX$$ description could be included in a docstring (/-- doc -/ in lean, I can't speak for other systems) for the theorems, to make mistakes easier to spot and correct.

LasseBlaauwbroek commented 1 month ago

Seconded for Coq: (** latex *).

eric-wieser commented 1 month ago

Probably also a good idea to have a script that extracts the docstrings from each language, and checks in CI that they are all consistent.

amit9oct commented 1 month ago

Thank you for opening this issue. I agree that something like this will make it easy for the repository maintainers to check the correctness of the formalization itself.

We are planning to add a tool (probably a web page on our main website) that will grab the three formalizations along with informal statements from the GitHub repository and then show them side-by-side. We can also build a local version of this so that the maintainers can see these on their local machines.

The reason for not adding comments in the file is that it will make it hard to maintain the same informal statement in two places, and might create problems when one gets modified but the other does not. If we create a tool for this, then we will not have to deal with any consistency issues which may arise. This will also keep parsing of formalizations simple, for example, if someone wants to just use the formal statement and not use informal statements for their AI tests they will not have to parse and remove the informal statement from the file.

eric-wieser commented 1 month ago

The reason for not adding comments in the file is that it will make it hard to maintain the same informal statement in two places, and might create problems when one gets modified but the other does not.

My recommendation here would be:

CI is a fantastic tool for enforcing consistency.