issues
search
open-compass
/
opencompass
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
https://opencompass.org.cn/
Apache License 2.0
4.18k
stars
446
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
update
#1717
MaiziXiao
opened
10 minutes ago
0
[Bug] 测评结果为空,求助
#1716
Jimmy-L99
opened
4 hours ago
3
[Bug] strategyqa answer extraction error
#1715
Linzwcs
opened
20 hours ago
0
[Feature] Add P-MMEval
#1714
wanyu2018umac
opened
1 day ago
0
Korbench
#1713
epsilondylan
closed
19 hours ago
0
Update Fullbench
#1712
tonysy
closed
1 hour ago
0
Update MATH dataset with model judge
#1711
liushz
closed
1 day ago
0
[Bug]
#1710
886gb
opened
2 days ago
0
Add RULER 64k
#1709
changlan
closed
20 hours ago
0
[Update] Add Math prm 800k
#1708
MaiziXiao
closed
4 days ago
0
[fix] output sequence under the multiple samples
#1707
cuauty
opened
5 days ago
0
[fix] output sequence under the multiple samples
#1706
cuauty
closed
5 days ago
0
Korbench
#1705
epsilondylan
closed
2 days ago
0
[Update] Update configurations
#1704
MaiziXiao
closed
4 days ago
0
[Bug] DO NOT Use relative import
#1703
jinmingyi1998
opened
6 days ago
2
support new error code
#1702
cuauty
closed
5 days ago
0
[CI] update torch version and add more datasets into daily testcase
#1701
zhulinJulia24
closed
5 days ago
0
[Feature] Update Math data
#1700
MaiziXiao
closed
1 week ago
0
update first_option_postprocess
#1699
MaiziXiao
closed
1 week ago
0
[Update] update volc CPU flavor
#1698
MaiziXiao
closed
1 week ago
0
Add Chinese SimpleQA config
#1697
OpenStellarTeam
opened
1 week ago
2
[Feature] 请问在主观评测上是否支持Azure OpenAI API
#1696
HypherX
closed
1 week ago
2
[Bug] 从0.2.6换到0.3.5,同一个模型的性能下降特别多,请问应该如何排查原因
#1695
daidaiershidi
closed
1 week ago
4
[BUMP] Bump version to 0.3.6
#1694
MaiziXiao
closed
1 week ago
0
[ci] update testcase baseline
#1693
zhulinJulia24
closed
1 week ago
0
[Update] MUSR dataset config prefix update
#1692
MaiziXiao
closed
1 week ago
0
Add Chinese SimpleQA dataset configuration
#1691
OpenStellarTeam
closed
1 week ago
1
[Update] Support Arc Prize Public Evaluation
#1690
jnanliu
opened
1 week ago
0
MuSR Datset Evaluation
#1689
abrohamLee
closed
1 week ago
0
[Fix] Fix bug for first_option_postprocess
#1688
MaiziXiao
closed
1 week ago
0
[Feature] HumanEvalX use Chat Mode as humaneval_openai_sample_evals_gen_159614
#1687
tonysy
opened
1 week ago
0
[Bug] The evaluation of configurations requiring a TopKRetriever, such as for flores_datasets, failed with a KeyError: 'metadata'.
#1686
DespairL
opened
1 week ago
0
[Update] Auto-download for followbench
#1685
MaiziXiao
closed
1 week ago
0
[Feature] BABILong Dataset added
#1684
MaiziXiao
closed
1 week ago
0
[Hotfix] Hotfix
#1683
bittersweet1999
closed
1 week ago
0
[ci] fix pr test bug
#1682
zhulinJulia24
closed
1 week ago
0
[Fix] Fixlint
#1681
bittersweet1999
closed
1 week ago
0
Revert "Add single lora adapter support for vLLM inference."
#1680
bittersweet1999
closed
1 week ago
0
Add single lora adapter support for vLLM inference.
#1679
DespairL
closed
1 week ago
0
[Bug] TypeError: Expected a datasets.Dataset or a datasets.DatasetDict object, but got {'train': <modelscope.msdatasets.ms_dataset.MsDataset object at xxx>, 'test': <modelscope.msdatasets.ms_dataset.MsDataset object at xxx>}
#1678
MrWiffer
opened
2 weeks ago
0
add jiutian-api cliet
#1677
DewidPig
opened
2 weeks ago
1
[Bug] v0.3.5版本评测Qwen/Qwen2.5-72B得分显著下降
#1675
guoshengCS
opened
2 weeks ago
10
[Hotfix] lmdeploy temp
#1674
bittersweet1999
closed
2 weeks ago
0
[Feature] dataset for humaneval-multipl
#1673
jyshee
opened
2 weeks ago
1
[Update] Dingo Dataset update
#1670
MaiziXiao
closed
2 weeks ago
0
[ci] react daily test
#1668
zhulinJulia24
closed
1 week ago
0
[Feature] Support Math23k
#1667
00INDEX
closed
1 week ago
1
[Feature] Add long context evaluation for base models
#1666
MaiziXiao
closed
2 weeks ago
0
[Feature] MMLU CEVAL MATH test results for some models
#1664
jinmingyi1998
opened
3 weeks ago
1
[Bug] 自定义judge model进行主观评测,报错模型未被注册
#1663
wuys1
closed
3 weeks ago
2
Next