open-compass opencompass issues

open-compass / opencompass

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

https://opencompass.org.cn/

Apache License 2.0

4.21k stars 449 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

[Update] Update Skywork/Qwen-QwQ

#1728 tonysy opened 5 hours ago
0
[Feature] Support LiveMathBench

#1727 jnanliu closed 6 hours ago
0
[Update] Update max_out_len for datasets

#1726 MaiziXiao opened 13 hours ago
0
[Bug] 在无gpu的机器上执行case，运行时报错数据集未注册（其实已经注册）

#1725 Caeser-SONG opened 14 hours ago
0
[ci] add common_summarizer return

#1724 zhulinJulia24 opened 17 hours ago
0
[Bug] windows下数据集的位置以及检测结果为0.0

#1723 Dbgsaoge opened 19 hours ago
0
[Fix] Update P-MMEVAL OSS data

#1722 liushz closed 1 day ago
0
您好，请问L-Eval的主观题最终得分是使用rougeLsum这个分数吗？还有就是L-Eval数据集缺少了codeU和sci_fi有相关的评测配置文件么？

#1721 13416157913 opened 1 day ago
0
[Feature] Add Openai Simpleqa dataset

#1720 liushz closed 1 day ago
0
[Fix] Fix pmmeval_gen config

#1719 liushz closed 1 day ago
0
[Feature] 请问使用API评测如何支持自定义数据集？

#1718 Jimmy-L99 opened 1 day ago
0
update

#1717 MaiziXiao closed 3 days ago
0
[Bug] 测评结果为空，求助

#1716 Jimmy-L99 opened 3 days ago
3
[Bug] strategyqa answer extraction error

#1715 Linzwcs opened 4 days ago
0
[Feature] Add P-MMEval

#1714 wanyu2018umac closed 2 days ago
0
Korbench

#1713 epsilondylan closed 4 days ago
0
Update Fullbench

#1712 tonysy closed 3 days ago
0
Update MATH dataset with model judge

#1711 liushz closed 4 days ago
0
[Bug] stop_at_stop_token 删除了生成的方法体导致没有评估结果

#1710 886gb opened 6 days ago
0
Add RULER 64k

#1709 changlan closed 4 days ago
0
[Update] Add Math prm 800k

#1708 MaiziXiao closed 1 week ago
0
[fix] output sequence under the multiple samples

#1707 cuauty closed 3 days ago
0
[fix] output sequence under the multiple samples

#1706 cuauty closed 1 week ago
0
Korbench

#1705 epsilondylan closed 6 days ago
0
[Update] Update configurations

#1704 MaiziXiao closed 1 week ago
0
[Bug] DO NOT Use relative import

#1703 jinmingyi1998 opened 1 week ago
2
support new error code

#1702 cuauty closed 1 week ago
0
[CI] update torch version and add more datasets into daily testcase

#1701 zhulinJulia24 closed 1 week ago
0
[Feature] Update Math data

#1700 MaiziXiao closed 1 week ago
0
update first_option_postprocess

#1699 MaiziXiao closed 1 week ago
0
[Update] update volc CPU flavor

#1698 MaiziXiao closed 1 week ago
0
Add Chinese SimpleQA config

#1697 OpenStellarTeam opened 1 week ago
2
[Feature] 请问在主观评测上是否支持Azure OpenAI API

#1696 HypherX closed 1 week ago
2
[Bug] 从0.2.6换到0.3.5，同一个模型的性能下降特别多，请问应该如何排查原因

#1695 daidaiershidi closed 1 week ago
4
[BUMP] Bump version to 0.3.6

#1694 MaiziXiao closed 1 week ago
0
[ci] update testcase baseline

#1693 zhulinJulia24 closed 2 weeks ago
0
[Update] MUSR dataset config prefix update

#1692 MaiziXiao closed 2 weeks ago
0
Add Chinese SimpleQA dataset configuration

#1691 OpenStellarTeam closed 1 week ago
1
[Update] Support Arc Prize Public Evaluation

#1690 jnanliu closed 2 days ago
0
MuSR Datset Evaluation

#1689 abrohamLee closed 2 weeks ago
0
[Fix] Fix bug for first_option_postprocess

#1688 MaiziXiao closed 2 weeks ago
0
[Feature] HumanEvalX use Chat Mode as humaneval_openai_sample_evals_gen_159614

#1687 tonysy opened 2 weeks ago
0
[Bug] The evaluation of configurations requiring a TopKRetriever, such as for flores_datasets, failed with a KeyError: 'metadata'.

#1686 DespairL opened 2 weeks ago
0
[Update] Auto-download for followbench

#1685 MaiziXiao closed 2 weeks ago
0
[Feature] BABILong Dataset added

#1684 MaiziXiao closed 2 weeks ago
0
[Hotfix] Hotfix

#1683 bittersweet1999 closed 2 weeks ago
0
[ci] fix pr test bug

#1682 zhulinJulia24 closed 2 weeks ago
0
[Fix] Fixlint

#1681 bittersweet1999 closed 2 weeks ago
0
Revert "Add single lora adapter support for vLLM inference."

#1680 bittersweet1999 closed 2 weeks ago
0
Add single lora adapter support for vLLM inference.

#1679 DespairL closed 2 weeks ago
0