issues
search
open-compass
/
opencompass
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
https://opencompass.org.cn/
Apache License 2.0
4.21k
stars
449
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
[Update] Update Skywork/Qwen-QwQ
#1728
tonysy
opened
5 hours ago
0
[Feature] Support LiveMathBench
#1727
jnanliu
closed
6 hours ago
0
[Update] Update max_out_len for datasets
#1726
MaiziXiao
opened
13 hours ago
0
[Bug] 在无gpu的机器上执行case,运行时报错数据集未注册(其实已经注册)
#1725
Caeser-SONG
opened
14 hours ago
0
[ci] add common_summarizer return
#1724
zhulinJulia24
opened
17 hours ago
0
[Bug] windows下数据集的位置以及检测结果为0.0
#1723
Dbgsaoge
opened
19 hours ago
0
[Fix] Update P-MMEVAL OSS data
#1722
liushz
closed
1 day ago
0
您好,请问L-Eval的主观题最终得分是使用rougeLsum这个分数吗?还有就是L-Eval数据集缺少了codeU和sci_fi有相关的评测配置文件么?
#1721
13416157913
opened
1 day ago
0
[Feature] Add Openai Simpleqa dataset
#1720
liushz
closed
1 day ago
0
[Fix] Fix pmmeval_gen config
#1719
liushz
closed
1 day ago
0
[Feature] 请问使用API评测如何支持自定义数据集?
#1718
Jimmy-L99
opened
1 day ago
0
update
#1717
MaiziXiao
closed
3 days ago
0
[Bug] 测评结果为空,求助
#1716
Jimmy-L99
opened
3 days ago
3
[Bug] strategyqa answer extraction error
#1715
Linzwcs
opened
4 days ago
0
[Feature] Add P-MMEval
#1714
wanyu2018umac
closed
2 days ago
0
Korbench
#1713
epsilondylan
closed
4 days ago
0
Update Fullbench
#1712
tonysy
closed
3 days ago
0
Update MATH dataset with model judge
#1711
liushz
closed
4 days ago
0
[Bug] stop_at_stop_token 删除了生成的方法体导致没有评估结果
#1710
886gb
opened
6 days ago
0
Add RULER 64k
#1709
changlan
closed
4 days ago
0
[Update] Add Math prm 800k
#1708
MaiziXiao
closed
1 week ago
0
[fix] output sequence under the multiple samples
#1707
cuauty
closed
3 days ago
0
[fix] output sequence under the multiple samples
#1706
cuauty
closed
1 week ago
0
Korbench
#1705
epsilondylan
closed
6 days ago
0
[Update] Update configurations
#1704
MaiziXiao
closed
1 week ago
0
[Bug] DO NOT Use relative import
#1703
jinmingyi1998
opened
1 week ago
2
support new error code
#1702
cuauty
closed
1 week ago
0
[CI] update torch version and add more datasets into daily testcase
#1701
zhulinJulia24
closed
1 week ago
0
[Feature] Update Math data
#1700
MaiziXiao
closed
1 week ago
0
update first_option_postprocess
#1699
MaiziXiao
closed
1 week ago
0
[Update] update volc CPU flavor
#1698
MaiziXiao
closed
1 week ago
0
Add Chinese SimpleQA config
#1697
OpenStellarTeam
opened
1 week ago
2
[Feature] 请问在主观评测上是否支持Azure OpenAI API
#1696
HypherX
closed
1 week ago
2
[Bug] 从0.2.6换到0.3.5,同一个模型的性能下降特别多,请问应该如何排查原因
#1695
daidaiershidi
closed
1 week ago
4
[BUMP] Bump version to 0.3.6
#1694
MaiziXiao
closed
1 week ago
0
[ci] update testcase baseline
#1693
zhulinJulia24
closed
2 weeks ago
0
[Update] MUSR dataset config prefix update
#1692
MaiziXiao
closed
2 weeks ago
0
Add Chinese SimpleQA dataset configuration
#1691
OpenStellarTeam
closed
1 week ago
1
[Update] Support Arc Prize Public Evaluation
#1690
jnanliu
closed
2 days ago
0
MuSR Datset Evaluation
#1689
abrohamLee
closed
2 weeks ago
0
[Fix] Fix bug for first_option_postprocess
#1688
MaiziXiao
closed
2 weeks ago
0
[Feature] HumanEvalX use Chat Mode as humaneval_openai_sample_evals_gen_159614
#1687
tonysy
opened
2 weeks ago
0
[Bug] The evaluation of configurations requiring a TopKRetriever, such as for flores_datasets, failed with a KeyError: 'metadata'.
#1686
DespairL
opened
2 weeks ago
0
[Update] Auto-download for followbench
#1685
MaiziXiao
closed
2 weeks ago
0
[Feature] BABILong Dataset added
#1684
MaiziXiao
closed
2 weeks ago
0
[Hotfix] Hotfix
#1683
bittersweet1999
closed
2 weeks ago
0
[ci] fix pr test bug
#1682
zhulinJulia24
closed
2 weeks ago
0
[Fix] Fixlint
#1681
bittersweet1999
closed
2 weeks ago
0
Revert "Add single lora adapter support for vLLM inference."
#1680
bittersweet1999
closed
2 weeks ago
0
Add single lora adapter support for vLLM inference.
#1679
DespairL
closed
2 weeks ago
0
Next