open-compass / T-Eval

[ACL2024] T-Eval: Evaluating Tool Utilization Capability of Large Language Models Step by Step
https://open-compass.github.io/T-Eval/
Apache License 2.0
235 stars 15 forks source link

结果要怎么算总的均值呢 #56

Closed li-aolong closed 6 months ago

li-aolong commented 6 months ago

全部加起来求平均就可以么?

li-aolong commented 6 months ago

没注意到还有个评估脚本