mazzzystar / TurtleBench

TurtleBench: Evaluating Top Language Models via Real-World Yes/No Puzzles.
https://arxiv.org/abs/2410.05262
Apache License 2.0
125 stars 9 forks source link