open-compass / T-Eval

[ACL2024] T-Eval: Evaluating Tool Utilization Capability of Large Language Models Step by Step
https://open-compass.github.io/T-Eval/
Apache License 2.0
235 stars 15 forks source link

您好,请问中文数据集测试一轮大概花多长时间? #18

Closed 13416157913 closed 9 months ago

13416157913 commented 10 months ago

您好,请问中文数据集测试一轮大概花多长时间?

zehuichen123 commented 10 months ago

这个具体没有估计过, 走opencompass 2,30张卡同时infer的话在半个小时左右.. 我们后面会release一个小一点的subset降低infer的成本 2333