hkust-nlp / ceval

Official github repo for C-Eval, a Chinese evaluation suite for foundation models [NeurIPS 2023]
https://cevalbenchmark.com/
MIT License
1.6k stars 74 forks source link

HOW TO EVALUATE STEM??? #74

Closed AkKari808 closed 8 months ago

AkKari808 commented 8 months ago

I found that I could only assess each small subject, such as computer_network, instead of a large subject, such as STEM. Is it true that in the experiment you can only measure each minor subject and then average the final score to get an average score for a major subject?

我发现在测评的时候只能测评每个小课目,比如computer_network,而不是一个大课目比如STEM。实际在实验中只能对每个小科目进行测评然后最后算平均分得到一个大科目的平均分吗?

HYZ17 commented 8 months ago

对的