trishullab / PutnamBench

An evaluation benchmark for undergraduate competition math in Lean4, Isabelle, Coq, and natural language.
56 stars 8 forks source link

Putnam 2017 B2 issues #184

Closed LasseBlaauwbroek closed 2 months ago

LasseBlaauwbroek commented 3 months ago

Another one from the paper: https://github.com/trishullab/PutnamBench/blob/6ddcaacfafbdcc16cf72740f78c7f3709328492b/coq/src/putnam_2017_b2.v#L3-L11

I'm seeing these kinds of patterns happen in other problems as well. My guess is that this happens because of the desire to inline all 'definitions' into the theorem. Why do this? Is there a problem with separate definitions?

GeorgeTsoukalas commented 2 months ago

Thanks for pointing this out!

Why does a problem related only to natural numbers involve real numbers? Seems convoluted and makes proving it more difficult.

At the time, our understanding was that sum_n only allowed summands which came from a type which abelian group structure. Summing over ints also led to problems, which was probably due to insufficient imports. In some formalizations which rely on Coquelicot's sum_n, we have modified to summand to be of type real. Anyways, these occurrences will change as we complete the migration to Mathcomp.

The way the theorem is stated seems very awkward. It says: "If you give me a number for which you can prove the requirement from the problem, then that number must be 16." I would expect more something much more straightforward like "the number 16 has the required properties":

I tend to agree that this one does seem to be a bit awkwardly stated. At least for Lean + Isabelle, I think the goals are generally stated in a way that matches the one you've proposed. As we go through the modifications for the Coq problems, we should clear up any of these.

I'm seeing these kinds of patterns happen in other problems as well. My guess is that this happens because of the desire to inline all 'definitions' into the theorem. Why do this? Is there a problem with separate definitions?

The formalizations are meant to be self-contained inside the theorem statement as is consistent with other competition math benchmarks we've seen. Also, our internal tooling happens to work best in this form, as does most of the other public tooling we are aware of.

GeorgeTsoukalas commented 2 months ago

I have included a mathcomp-based formulation of your suggestion in #201. There is also an analogous change for the Lean problem, which was stated in the same way. The original formalization of this problem was probably somewhat awkward because it was among the first problems we started with. As the modifications for Coq continue, any other similar issues should clear up. Thanks again!

LasseBlaauwbroek commented 2 months ago

Why is there all this casting between integers and naturals going on? Seems to me that the problem can be entirely stated using natural numbers. The way it is written now makes it seem like there is something special going on.

Also, inconsistent usage between > and gt.

LasseBlaauwbroek commented 2 months ago

Also, for the Lean version, why not use let just like in Coq? These versions where a variable is hypothesized with an equality that defines the behavior of the variable are quite difficult to read.

GeorgeTsoukalas commented 2 months ago

Why is there all this casting between integers and naturals going on? Seems to me that the problem can be entirely stated using natural numbers.

From experience in proving stuff in Lean, it's easier to reason about sums over integers than nats as one is allowed all the ring operations. As an example, putnam_1988_b1 is much more annoying to prove if the statement is entirely in nats.

Also, inconsistent usage between > and gt.

The > operator does not work for nats when ring_scope is opened. gt is an operation defined for nats, and does not work immediately for ints.

These versions where a variable is hypothesized with an equality that defines the behavior of the variable are quite difficult to read.

I agree that the way this is done right now is a bit cumbersome to parse. We will eventually find a better way to state these things, maybe with some new syntactic shortcuts we request of the Lean community, or as you suggest. For now, this is a somewhat lower priority task.