issues
search
open-compass
/
T-Eval
[ACL2024] T-Eval: Evaluating Tool Utilization Capability of Large Language Models Step by Step
https://open-compass.github.io/T-Eval/
Apache License 2.0
235
stars
15
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Cannot download dataset
#63
YEEthanCC
closed
1 week ago
0
执行plan任务会卡住,没有输出执行结果文件
#62
1207408359
opened
3 weeks ago
0
Failed to convert pandas DataFrame to Arrow Table from file '/home/####/.cache/huggingface/hub/datasets--lovesnowbest--T-Eval/snapshots/af355ab2b62cdbae4262ac41c7529ffeae395012/data/instruct_v2.json' with error <class 'pyarrow.lib.ArrowInvalid'>: ('cannot mix list and non-list, non-null values', 'Conversion failed for column query_60_0_3 with type object')
#61
OfficerChul
opened
4 weeks ago
0
能否根据结果文件汇总指标得到overall结果的脚本?
#60
Cppowboy
opened
3 months ago
0
数据链接打不开
#59
cobraheleah
opened
4 months ago
1
如何计算的分数
#58
hmppt
opened
4 months ago
0
Opensource Model not found ——Nanbeige-Agent-32B
#57
tttonytan
opened
5 months ago
0
结果要怎么算总的均值呢
#56
li-aolong
closed
6 months ago
1
llama2模板有问题
#55
li-aolong
opened
6 months ago
0
推理速度慢
#54
lihaibineric
opened
6 months ago
0
Review测评指标失真,Qwen被严重低估了
#53
fengzhu1
opened
6 months ago
1
测试结果不完整
#52
Mrak6192
opened
6 months ago
2
是否支持qwen1.5,复现结果差距较大
#51
Little-girl-1992
opened
7 months ago
0
对数据集case数的疑问
#50
AmberXu98
opened
7 months ago
2
请问能否提供一份完全对齐openai输入格式的测试数据
#49
Watebear
opened
7 months ago
2
Can not eval when set batch_size>1
#48
dkqkxx
opened
7 months ago
2
support vllm generation
#47
eitanturok
opened
8 months ago
0
fix pip command
#46
eitanturok
closed
8 months ago
0
Questions about T-Eval
#45
Cppowboy
closed
8 months ago
0
Llama2 7b chat 模型,输入长度超过 4096
#44
Watebear
opened
8 months ago
1
When do you want to support internlm2
#43
seanxuu
closed
6 months ago
1
How to use multi-gpu to test?
#42
seanxuu
opened
8 months ago
0
Update README.md
#41
seanxuu
opened
8 months ago
0
qwen14B测试python test.py 报错
#40
chococatsrin
opened
8 months ago
0
qwen1.5 tokenizer错误
#39
chococatsrin
opened
8 months ago
0
qwen-14b评测结果疑问
#38
Fenglly
opened
8 months ago
0
Evaluate Claude 3
#37
stalkermustang
opened
9 months ago
0
关于plan_json_v1_zh.json数据文件答案问题
#36
13416157913
opened
9 months ago
0
关于plan_json_v1_zh.json数据文件答案问题
#35
13416157913
opened
9 months ago
0
API model
#34
Fenglly
opened
9 months ago
3
BUG: stop_words
#33
ZHUANGMINGXI
opened
9 months ago
5
BUG: stop_words
#32
ZHUANGMINGXI
closed
9 months ago
0
API model ERROR
#31
HC-Guo
opened
9 months ago
3
【BUG】RuntimeError: The size of tensor a (8192) must match the size of tensor b (8193) at non-singleton dimension 3
#30
Ayooooo
opened
9 months ago
3
大家好,有个T-Eval评测数据集的疑惑,希望各位帮忙解答一下,感谢。
#29
13416157913
closed
9 months ago
2
代码bug
#28
xjwhy
opened
9 months ago
1
Tool Set的问题
#27
yitianlian
opened
9 months ago
5
有关数据开源的问题
#26
pengming617
opened
9 months ago
2
BUG
#25
nyBball
opened
9 months ago
13
cannot import name 'HFTransformerChat' from 'lagent.llms.huggingface
#24
xjwhy
closed
9 months ago
2
论文结果无法复现
#23
nyBball
closed
9 months ago
1
add TODO for opencompass support
#22
zehuichen123
closed
10 months ago
0
对评测速度和结果的疑问
#21
klykq111
opened
10 months ago
4
T-Eval加入open-compass框架
#20
merlinarer
opened
10 months ago
1
update TODO for T-eval
#19
zehuichen123
closed
10 months ago
0
您好,请问中文数据集测试一轮大概花多长时间?
#18
13416157913
closed
9 months ago
1
请问plan和instruct的区别?
#17
milk-bottle-liyu
closed
9 months ago
4
请问bench里面有关于测试大语言模型翻译能力的吗?具体是哪一项
#16
White-Friday
closed
9 months ago
1
update web image
#15
StigLidu
closed
10 months ago
0
QWen测试message格式问题
#14
gewenbin0992
opened
10 months ago
1
Next