how can i evaluate mathematic datasets like GSM8K?

meta-llama / llama

Inference code for Llama models

Other

56.57k stars 9.59k forks source link

how can i evaluate mathematic datasets like GSM8K? #1118

Open junseo-jang opened 6 months ago

junseo-jang commented 6 months ago

Hi, I'm trying to evaluate the performance of llama2-7b on math datas. However I found out that the form of prediction differs at every prediction. So it is not easy to extract only the answer from the output. Is there any way that I can do for this problem?