lmarena / arena-hard-auto

Arena-Hard-Auto: An automatic LLM benchmark.
Apache License 2.0
606 stars 71 forks source link

Add support for vision-language conversations #41

Closed BabyChouSr closed 1 month ago

BabyChouSr commented 1 month ago

Add some python dependencies needed to run BenchBuilder and also add ability to run this with conversations that also contain images.

CodingWithTim commented 1 month ago

Oh shit it is the GOAT!

CodingWithTim commented 1 month ago

Could u add a new requirement file for the BenchBuilder code? I think we want to avoid adding requirements in the requirement.txt file that isn't necessary for running the benchmark. Most people using Arena-Hard atm just runs the benchmark, not BenchBuilder. Thanks broski.