关于few-shot的一些问题

modelscope / eval-scope

A streamlined and customizable framework for efficient large model evaluation and performance benchmarking

Apache License 2.0

110 stars 14 forks source link

Open MrZhang1996 opened 4 months ago

MrZhang1996 commented 4 months ago

请问为什么把benchmark的few-shot都写死成0了？自己手动设置few_shot_num=few_shot_num是否有问题，在测试hellaswag时zero-shot分数反而比10-shot高，感谢您的回复

wangxingjun778 commented 3 months ago

实验settings的问题，近两天会出一个增加配置化的版本（可以传yaml、dataclass or dict）；few-shot的设置同样包含在其中，做成配置化
现有实验发现，模型在某些数据集上，0-shot上的表现确实会比k-shot上好。推测跟每个模型的Instruction following能力有关，k-shot prompts中会带有patterns with bias，反过来导致效果下降。

wangxingjun778 commented 3 months ago

另，如果不走配置文件的方式，如果从run命令中传入，可以采用如下方式： --dataset-args 参数中，传入 {'mmlu': {'few_shot_num': 5}, ...} 这样的方式，来设置该参数。

MrZhang1996 commented 3 months ago

另，如果不走配置文件的方式，如果从run命令中传入，可以采用如下方式： --dataset-args 参数中，传入 {'mmlu': {'few_shot_num': 5}, ...} 这样的方式，来设置该参数。

了解！感谢您的回复！