为什么我用c-eavl测试chatglm2-6B 在zero-shot 下的分数很低？

hkust-nlp / ceval

Official github repo for C-Eval, a Chinese evaluation suite for foundation models [NeurIPS 2023]

https://cevalbenchmark.com/

MIT License

1.63k stars 78 forks source link

为什么我用c-eavl测试chatglm2-6B 在zero-shot 下的分数很低？ #53

Open EdisonWujr opened 1 year ago

EdisonWujr commented 1 year ago

为什么我用c-eavl测试chatglm2-6B 在zero-shot 下的分数很低？并且使用cot的模式，分数也没有提升。

EdisonWujr commented 1 year ago

补充一下，用few-shot cot模式的话，评分就接近官方

1014670860 commented 1 year ago

补充一下，用few-shot cot模式的话，评分就接近官方

请问一下是怎么测评的, 看README也不清楚, --model-name 写本地地址也不行,

ssssmy commented 1 year ago

@1014670860 在chatglm文件中写你本地的模型地址。不过我不知道怎么运行test测试，只可以跑dev

Flywolfs commented 1 year ago

是不是和代码里这行代码有关系？ choice_score = [score[167], score[333], score[251], score[416]] 我看代码好像这个chatglm的评估是适用于chatglm-6b，并没有适配chatglm2-6b。因为两个版本的模型词表不一样

Flywolfs commented 1 year ago

为什么我用c-eavl测试chatglm2-6B 在zero-shot 下的分数很低？并且使用cot的模式，分数也没有提升。这个问题确认了，如果使用chatglm2-6b进行测试的话，使用zeroshot时，chatglm.py的145行需要改成： choice_score = [score[316], score[347], score[319], score[367]]才能对应到新版本的词表