meta-math / MetaMath

MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models
https://meta-math.github.io
Apache License 2.0
386 stars 36 forks source link

MetaMath-Mistral-7B gsm8k/math acc different from the reported values #20

Closed haoxiongliu closed 8 months ago

haoxiongliu commented 11 months ago

I tried run_mistral.sh and get: gsm8k acc==== 0.7376800606520091 MATH acc==== 0.2726

I also tried

export HF_SAVE_PATH="meta-math/MetaMath-Mistral-7B" && \
python eval_gsm8k.py --model $HF_SAVE_PATH --data_file ./data/test/GSM8K_test.jsonl && \
python eval_math.py --model $HF_SAVE_PATH --data_file ./data/test/MATH_test.jsonl

and get: gsm8k acc==== 0.7710386656557998 MATH acc==== 0.278

which is also a bit different from the reported 77.7 and 28.2.

I would like to know your opinion on if this is normal and what might be the cause. Thanks!

yulonghui commented 11 months ago

Hi,Haoxiong, Many thanks for your attention. The reason the data we reported and the data you tested differ is that when uploaded to the Hugging Face repository, Hugging Face performs some processing on the model weights. The data we tested locally is consistent with the data we reported.

tongxiao2002 commented 8 months ago

Hi,Haoxiong, Many thanks for your attention. The reason the data we reported and the data you tested differ is that when uploaded to the Hugging Face repository, Hugging Face performs some processing on the model weights. The data we tested locally is consistent with the data we reported.

Thank you for your reply. I wonder do you know what kind of processing that Hugging Face performs on the model weights? If I know what it is, may be I could try to fix this gap by myself.