open-compass / opencompass

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
https://opencompass.org.cn/
Apache License 2.0
3.92k stars 413 forks source link

[Bug] failed to reproduce Qwen #407

Closed Leymore closed 1 year ago

Leymore commented 1 year ago

Prerequisite

Type

I'm evaluating with the officially supported tasks/models/datasets.

Environment

None

Reproduces the problem - code/configuration sample

None

Reproduces the problem - command or script

None

Reproduces the problem - error message

None

Other information

From WeChat

山竹爱饼干 2023/09/18 10:35 有同学遇到在更新token的patch之后,Qwen评测掉点的问题吗

山竹爱饼干 2023/09/18 10:36 之前没更新的时候,我自己加了一个简单的判断,能跟榜单对齐

山竹爱饼干 2023/09/18 10:37 image

山竹爱饼干 2023/09/18 10:37 更新之后,就全面掉点了

周丰哲 2023/09/18 10:39 这个 patch 是 opencompass 的 patch 吗?

山竹爱饼干 2023/09/18 10:40 是的呢,我看应该是这

山竹爱饼干 2023/09/18 10:40 image

MelodyChenjun commented 1 year ago

这里的问题解决了,是transformers的版本问题,跟pad_token_id没关系 image

tonysy commented 1 year ago

Feel free to re-open if needed.