meta-math / MetaMath

MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models
https://meta-math.github.io
Apache License 2.0
378 stars 33 forks source link

Potential error in eval_gsm8k.py #23

Open hbin0701 opened 9 months ago

hbin0701 commented 9 months ago

Dear authors, thank you for the amazing work and sharing your code and data!

I wanted to ask about your evaluation code, as currently if the model outputs an answer with decimal point, it automatically rounds to the nearest integer.

In this way, a wrong answer (i.e. 8.5) could be considered correct (i.e. as 9), in spite of a calculation error, which indeed often occurs with some model generations.

In this light, I believe a stricter evaluation code may be needed.