meta-llama / llama

Inference code for Llama models
Other
54.12k stars 9.32k forks source link

how can i evaluate mathematic datasets like GSM8K? #1118

Open junseo-jang opened 1 month ago

junseo-jang commented 1 month ago

Hi, I'm trying to evaluate the performance of llama2-7b on math datas. However I found out that the form of prediction differs at every prediction. So it is not easy to extract only the answer from the output. Is there any way that I can do for this problem?