mazzzystar / TurtleBench

TurtleBench: Evaluating Top Language Models via Real-World Yes/No Puzzles.
https://arxiv.org/abs/2410.05262
Apache License 2.0
125 stars 9 forks source link

make it works and formatting the code. #3

Closed iamsk closed 3 months ago

iamsk commented 3 months ago

add requirements clean the imports bugfix: log_folder add model filter sample test cases

mazzzystar commented 3 months ago

Merged. Thanks for your great work !