open-compass T-Eval issues

open-compass / T-Eval

[ACL2024] T-Eval: Evaluating Tool Utilization Capability of Large Language Models Step by Step

https://open-compass.github.io/T-Eval/

Apache License 2.0

235 stars 15 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Cannot download dataset

#63 YEEthanCC closed 1 week ago
0
执行plan任务会卡住，没有输出执行结果文件

#62 1207408359 opened 3 weeks ago
0
Failed to convert pandas DataFrame to Arrow Table from file '/home/####/.cache/huggingface/hub/datasets--lovesnowbest--T-Eval/snapshots/af355ab2b62cdbae4262ac41c7529ffeae395012/data/instruct_v2.json' with error <class 'pyarrow.lib.ArrowInvalid'>: ('cannot mix list and non-list, non-null values', 'Conversion failed for column query_60_0_3 with type object')

#61 OfficerChul opened 4 weeks ago
0
能否根据结果文件汇总指标得到overall结果的脚本？

#60 Cppowboy opened 3 months ago
0
数据链接打不开

#59 cobraheleah opened 4 months ago
1
如何计算的分数

#58 hmppt opened 4 months ago
0
Opensource Model not found ——Nanbeige-Agent-32B

#57 tttonytan opened 5 months ago
0
结果要怎么算总的均值呢

#56 li-aolong closed 6 months ago
1
llama2模板有问题

#55 li-aolong opened 6 months ago
0
推理速度慢

#54 lihaibineric opened 6 months ago
0
Review测评指标失真，Qwen被严重低估了

#53 fengzhu1 opened 6 months ago
1
测试结果不完整

#52 Mrak6192 opened 6 months ago
2
是否支持qwen1.5，复现结果差距较大

#51 Little-girl-1992 opened 7 months ago
0
对数据集case数的疑问

#50 AmberXu98 opened 7 months ago
2
请问能否提供一份完全对齐openai输入格式的测试数据

#49 Watebear opened 7 months ago
2
Can not eval when set batch_size>1

#48 dkqkxx opened 7 months ago
2
support vllm generation

#47 eitanturok opened 8 months ago
0
fix pip command

#46 eitanturok closed 8 months ago
0
Questions about T-Eval

#45 Cppowboy closed 8 months ago
0
Llama2 7b chat 模型，输入长度超过 4096

#44 Watebear opened 8 months ago
1
When do you want to support internlm2

#43 seanxuu closed 6 months ago
1
How to use multi-gpu to test?

#42 seanxuu opened 8 months ago
0
Update README.md

#41 seanxuu opened 8 months ago
0
qwen14B测试python test.py 报错

#40 chococatsrin opened 8 months ago
0
qwen1.5 tokenizer错误

#39 chococatsrin opened 8 months ago
0
qwen-14b评测结果疑问

#38 Fenglly opened 8 months ago
0
Evaluate Claude 3

#37 stalkermustang opened 9 months ago
0
关于plan_json_v1_zh.json数据文件答案问题

#36 13416157913 opened 9 months ago
0
关于plan_json_v1_zh.json数据文件答案问题

#35 13416157913 opened 9 months ago
0
API model

#34 Fenglly opened 9 months ago
3
BUG: stop_words

#33 ZHUANGMINGXI opened 9 months ago
5
BUG: stop_words

#32 ZHUANGMINGXI closed 9 months ago
0
API model ERROR

#31 HC-Guo opened 9 months ago
3
【BUG】RuntimeError: The size of tensor a (8192) must match the size of tensor b (8193) at non-singleton dimension 3

#30 Ayooooo opened 9 months ago
3
大家好，有个T-Eval评测数据集的疑惑，希望各位帮忙解答一下，感谢。

#29 13416157913 closed 9 months ago
2
代码bug

#28 xjwhy opened 9 months ago
1
Tool Set的问题

#27 yitianlian opened 9 months ago
5
有关数据开源的问题

#26 pengming617 opened 9 months ago
2
BUG

#25 nyBball opened 9 months ago
13
cannot import name 'HFTransformerChat' from 'lagent.llms.huggingface

#24 xjwhy closed 9 months ago
2
论文结果无法复现

#23 nyBball closed 9 months ago
1
add TODO for opencompass support

#22 zehuichen123 closed 10 months ago
0
对评测速度和结果的疑问

#21 klykq111 opened 10 months ago
4
T-Eval加入open-compass框架

#20 merlinarer opened 10 months ago
1
update TODO for T-eval

#19 zehuichen123 closed 10 months ago
0
您好，请问中文数据集测试一轮大概花多长时间？

#18 13416157913 closed 9 months ago
1
请问plan和instruct的区别？

#17 milk-bottle-liyu closed 9 months ago
4
请问bench里面有关于测试大语言模型翻译能力的吗？具体是哪一项

#16 White-Friday closed 9 months ago
1
update web image

#15 StigLidu closed 10 months ago
0
QWen测试message格式问题

#14 gewenbin0992 opened 10 months ago
1