meta-math / MetaMath

MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models
https://meta-math.github.io
Apache License 2.0
387 stars 35 forks source link

eval_math script outputs 0 accuracy #4

Closed zhangir-azerbayev closed 1 year ago

zhangir-azerbayev commented 1 year ago

When I run

python eval_math.py --model meta-math/MetaMath-7B-V1.0 --data_file data/test/MATH_test.jsonl --tensor_parallel_size 1

from the base directory of this repository, the final output is

start=== 0 , end==== 9223372036854775807
length==== 5000 , acc==== 0.0

I ran inference on a 1x A100 40GB. I am using vllm v0.1.y, transformers 4.33.2, and torch 2.0.1.

yulonghui commented 1 year ago

Hi~ zhangir-azerbayev, Thanks for your attention! have you fixed this error? I retested it, and I am able to obtain the expected numbers. it works for both 1x A100 80GB and 8x A100 80GB The environment I am using is: torch==2.0.1+cu117 vllm==0.1.4 transformers==4.31.0

zhangir-azerbayev commented 1 year ago

Hi~ zhangir-azerbayev, Thanks for your attention!

have you fixed this error?

I retested it, and I am able to obtain the expected numbers. it works for both 1x A100 80GB and 8x A100 80GB

The environment I am using is:

torch==2.0.1+cu117

vllm==0.1.4

transformers==4.31.0

Thanks for your response. I'll retry with those package versions and report back.

zhangir-azerbayev commented 1 year ago

Thanks, was able to repro the results in this paper with vllm==0.1.4 and transformers==4.31.0. Will PR a requirements.txt file when I have time.

choco9966 commented 1 year ago

I ran the eval scripts with the weights from meta-math/MetaMath-7(13)B-V1.0 and obtained scores of 0.1948 for 7B and 0.2238 for 13B. However, when I ran SFT on the metamathqa data for LLama2-7B/13B, I got 0 score for math and 0.71 for gsm8k. What could be the reason for the score of 0 in math, and how can I prevent it? (I don't think it's an issue with the package version.)

The environment I am using is: torch==2.0.1+cu118 vllm==0.1.4 transformers==4.34.0