issues
search
jeinlee1991
/
chinese-llm-benchmark
中文大模型能力评测榜单:目前已囊括128个大模型,覆盖chatgpt、gpt-4o、谷歌gemini、百度文心一言、阿里通义千问、百川、讯飞星火、商汤senseChat、minimax等商用模型, 以及qwen2.5、llama3.1、glm4、书生internLM2.5、openbuddy、AquilaChat等开源大模型。不仅提供能力评分排行榜,也提供所有模型的原始输出结果!
2.63k
stars
123
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
大模型原始输出结果
#50
KyleWang-Hunter
closed
1 day ago
1
我们能不能增加一些基于智能体的评测
#49
CoderYiFei
closed
1 day ago
1
Claude sonnet 3.5呢
#48
Skywalker144
opened
1 week ago
0
add English title and the maintenance badge
#47
zhimin-z
opened
3 weeks ago
0
请教下主观题是如何评测的?
#46
runwean
closed
1 day ago
1
可否测评下最新开源的qwen2.5?
#45
ConleyKong
closed
1 day ago
1
deepseek-chat-v2不是开源的吗
#44
liyuefeng
closed
1 day ago
1
纯粹搞笑的评测, 收了百度多少钱?
#43
a5185330
opened
1 month ago
1
可否评测一下stepfun的系列模型
#42
forrestlinfeng
opened
1 month ago
0
可以增加llama3.1评测数据吗
#41
Anionex
closed
1 month ago
3
能不能对各能力做一个详细的解释啊?
#40
Wooden-Gear
opened
3 months ago
0
开个 Nemotron-4 340B 评价
#39
wrench1997
opened
4 months ago
0
新增Yi-1.5系列模型的数据
#38
zzc0208
closed
1 month ago
1
10B以下的LLM排名不太准确,实际使用ChatGLM3-6B和Qwen1.5-7B表现更好
#37
danny-zhu
opened
4 months ago
2
评测一下 deepseek v2
#36
cubxxw
closed
1 month ago
1
评测数据无法吐槽
#35
freedomRen
opened
5 months ago
3
10b以下开源排名榜单不靠谱
#34
wyfSunflower
opened
5 months ago
0
缺少重要的claude系列,申请加入相关测评
#33
chiguabaobao
opened
6 months ago
2
能否加入qianwen1.5-32B的评测
#32
yu-zheng-tao
closed
6 months ago
2
能否加入Function Call(工具调用)能力指标评测
#31
Dream-s-Wang
opened
7 months ago
1
讯飞星火13B开源模型测评
#30
STHSF
opened
7 months ago
0
可否增加claude3商用模型的评测
#29
yu-zheng-tao
opened
7 months ago
0
为什么千问1.5-14B-chat分这么高,比72b还高?
#28
yu-zheng-tao
closed
6 months ago
4
为什么千问1.5-14B-chat分这么高,比72b还高?
#27
yu-zheng-tao
closed
7 months ago
0
可否将kimi chat加入榜单
#26
LengmoAngel
closed
6 months ago
1
建议增加1B模型测试
#25
yuys0602
closed
6 months ago
1
讯飞星火推出3.5版本
#24
zhisuyan
closed
6 months ago
1
Is there any arxiv paper or report for this benchmark?
#23
zhimin-z
opened
9 months ago
0
update new model
#22
zzc0208
closed
4 months ago
0
可以测试一下openbuddy-deepseek-67b-v15.2
#21
openmynet
closed
6 months ago
1
文心一言的新版本复测
#20
huanghuanhuahuh
closed
6 months ago
1
What is the evaluation criteria for the score?
#19
zhimin-z
opened
10 months ago
0
This link does not redirect...
#18
zhimin-z
opened
11 months ago
0
Why does evaluation of encoding efficiency not count into the overall score?
#17
zhimin-z
opened
11 months ago
0
强烈建议加入moonshot的Kimi chat!!!
#16
witherlll
closed
6 months ago
2
我Claude呢?
#15
JiangKaslana
opened
1 year ago
0
评测数据太少了吧,这能说明问题?
#14
yyl424525
opened
1 year ago
1
How should I cite this work?
#13
g-h-chen
opened
1 year ago
0
如果有各个模型的部署硬件要求对比就好了
#12
zhangmianhongni
opened
1 year ago
0
可以评测一下Chinese-LLaMA-Alpaca-2吗
#11
dodogreen
opened
1 year ago
0
可以评测一下千问-7B模型吗
#10
liudayiheng
closed
6 months ago
0
很棒的测评,请问项目主测试数据可以转载吗
#9
l269438
closed
6 months ago
1
通义千问的评测时间?
#8
liudayiheng
closed
6 months ago
0
很好的工作,不知道未来有将Anima-30B模型列入评测计划么?
#7
UI233
opened
1 year ago
0
希望能够增加RWKV模型进行评测
#6
OopsYouDiedE
opened
1 year ago
4
提供结果复现代码
#5
azmat21
opened
1 year ago
0
如何提交自己的模型进行评测?
#4
Taoooo9
opened
1 year ago
1
eval中是所有评测数据吗
#3
TTCoding
closed
6 months ago
1
很棒的工作, 请问评分标准是怎么样的呢?是如何给这些模型打分的?
#2
wwngh1233
opened
1 year ago
7
请问为什么没有bing?
#1
tutianyu101
closed
6 months ago
1
Next