issues
search
mazzzystar
/
TurtleBenchmark
Benchmark for LLM Reasoning & Understanding with Challenging Tasks from Real Users.
https://mazzzystar.github.io/2024/08/09/turtle-benchmark-zh/
101
stars
7
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Using LLMs as players to test which model asks the right questions.
#5
iamsk
opened
3 weeks ago
4
Cot prompts
#4
ax7e
opened
3 weeks ago
2
make it works and formatting the code.
#3
iamsk
closed
3 weeks ago
1
another implementation method
#2
waleyGithub
opened
1 month ago
1
chinese复现失败,差5个点以上
#1
siftxxx
closed
1 month ago
4