About evaluation of Dart-Math

hkust-nlp / dart-math

[NeurIPS'24] Official code for *🎯DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*

https://hkust-nlp.github.io/dart-math/

MIT License

65 stars 3 forks source link

About evaluation of Dart-Math #3

Closed XinXU-USTC closed 1 month ago

XinXU-USTC commented 2 months ago

Thank you for your excellent work and well-structured code repo.

I encountered some issues regarding the evaluation of Dart-Math. I ran the provided evaluation script to evaluate Dart-Math-Llama-3-8b-prof2diff on theoremqa and found that all items are false in the generated .jsonl file. What is the problem?

The resulted file is attached below. https://hkustconnect-my.sharepoint.com/:u:/g/personal/xxuca_connect_ust_hk/ETFyWNbUrghFpa4H1_L6hpcB4AD8ze3mRoUrshPQcEGR3w?e=lzAEnX

tongyx361 commented 1 month ago

Thanks for your issue! I figured out and fixed a bug related to this error:

Commit https://github.com/hkust-nlp/dart-math/commit/a6647ea7e04c1d8c0e6b02a679d116de7c04f43d set ignore_eos=True because Llama-3-8B(-Base) tends to decode EoS immediately sometimes, but this messes up generation of normal (instruct) models. So I made ignore_eos a CLI option, which is False by default, in the latest commit https://github.com/hkust-nlp/dart-math/commit/82377fc66af5c1c76bf3d8e40175cd21772fa660.

XinXU-USTC commented 1 month ago

Thank you for getting back to me so quickly! The problem is solved! I appreciate your help!