lmarena / arena-hard-auto

Arena-Hard-Auto: An automatic LLM benchmark.
Apache License 2.0
606 stars 71 forks source link

Add litellm, unified dataclass description, and compatibility with vision-language models #44

Closed BabyChouSr closed 2 days ago

BabyChouSr commented 1 month ago

Three major changes:

  1. Add the usage of litellm as an api-type which reduces the burden of having to add API provider routers ourselves.
  2. Add dataclasses instead of using plain json to read in the yaml configs.
  3. Add compatibility to read in images for vision-language models.
CodingWithTim commented 1 month ago

@infwinston Let's talk about adding support for vision? This would be a major change for Arena Hard if we add support for this. Most of the people currently aren't using Arena-Hard for vision. I suggest using a separate python script file to do vision evaluation, and create a new folder for all the vision related works.

BabyChouSr commented 2 days ago

I will close this for now. Will re-open when this is ready to go.