issues
search
hkust-nlp
/
ceval
Official github repo for C-Eval, a Chinese evaluation suite for foundation models [NeurIPS 2023]
https://cevalbenchmark.com/
MIT License
1.6k
stars
74
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
题目错误
#36
AiLMe-AI
closed
1 year ago
1
Download method1 download link incorrect
#35
treya-lin
closed
1 year ago
1
chatgpt數據更新
#34
CRGBS
opened
1 year ago
0
根据code/Readme.md中给出的示例尝试遇到问题
#33
yixin-zhu
closed
1 year ago
2
想参与annotation工作,以前是高中物理老师,现在香港读MPhil,求捞
#32
lyconghk
closed
1 year ago
1
Why chatglm2-6b score is higher than gpt-4 in your leaderboard?
#31
ChristopheZhao
closed
1 year ago
2
请问下这个结论是根据哪些观察得来的?
#30
wwngh1233
closed
10 months ago
1
可以支持下Ziya-13B-v1.1嘛
#29
helldog-star
closed
1 year ago
0
typo: correctly spell the word "explanation"
#28
Mowmowj
closed
1 year ago
1
能支持下最新出的baichuan-7B模型吗
#27
linghongli
closed
1 year ago
1
edit path
#26
Phosphor-Bai
closed
1 year ago
0
请问hf格式的llama模型有公开的测试代码吗
#25
WUHU-G
opened
1 year ago
4
prompt大于max_len时的处理方式?
#24
bbyjlb
closed
1 year ago
1
chatglm-6b验证集复现出来和论文有一点小差异。
#23
dyy401453043
closed
1 year ago
4
test数据集为什么没有提供answer答案一列呢
#22
linghongli
closed
1 year ago
1
Added zero-shot evaluation of chatglm.
#21
Phosphor-Bai
closed
1 year ago
0
测试环境
#20
XqFeng-Josie
closed
1 year ago
3
支持通过huggingface + lora加载llama系列模型
#19
Ricardokevins
closed
10 months ago
2
6.8更新了结果,对应的测试代码是否有变化?
#18
DaoD
closed
1 year ago
1
为什么leadbord中模型的结果比之前好很多呢?
#17
guozhiyao
closed
1 year ago
6
prompt不同的疑惑?
#16
cason0126
closed
1 year ago
1
测试结果的波动
#15
guozhiyao
closed
1 year ago
4
评测结果中Bloomz-mt的参数规模是多少呢?
#14
guozhiyao
closed
1 year ago
1
提交文件的疑惑
#13
guozhiyao
closed
1 year ago
4
[Enhancement]: 有计划把ceval集成在openai的evals框架下面吗
#12
rgtjf
closed
1 year ago
6
Chinese Alpaca-13B怎么会比Chinese LLama-13B 效果差呢?
#11
liuyukid
closed
1 year ago
1
疑问:是否对齐不同模型之间结果
#10
Desein-Yang
closed
1 year ago
3
请问LLaMA-65B在测试的时候,题目是英文输入还是中文输入?
#9
cingtiye
closed
1 year ago
3
potential wrong label
#8
jiexiongw
closed
1 year ago
1
没有文心一言、讯飞星火、通义千问等已发布的大模型?
#7
wqw547243068
closed
1 year ago
2
Update README.md
#6
eltociear
closed
1 year ago
0
Missing types of subject_mapping.json
#5
FateScript
closed
1 year ago
1
建议支持`--subject=all`和`--subject=hard`
#4
wjfwzzc
closed
10 months ago
3
有发布到pip上吗
#3
moon-fall
closed
10 months ago
0
Adding evaluation script for chatglm and minimax.
#2
Phosphor-Bai
closed
1 year ago
0
为什么不评估zero-shot的性能呢
#1
yyht
closed
1 year ago
3
Previous