open-compass / MathBench

[ACL 2024 Findings] MathBench: A Comprehensive Multi-Level Difficulty Mathematics Evaluation Dataset
https://open-compass.github.io/MathBench/
Apache License 2.0
82 stars 1 forks source link

What is the correct way to evaluate the base model deepseek-math-7b-base? #25

Open adventuree-cyber opened 1 month ago

adventuree-cyber commented 1 month ago

I used the config at opencompass/configs/datasets/MathBench/mathbench_2024_gen_1dc21d.py and set the use_ppl_single_choice = True. I'm not sure if this is the correct config to test a base model like deepseek-math-7b-base.

liushz commented 1 week ago

I recommend you use mathbench_2024_few_shot_mixed_4a3fd4.py for base model evaluation.