lmarena / arena-hard-auto

Arena-Hard-Auto: An automatic LLM benchmark.
Apache License 2.0
665 stars 76 forks source link

Add litellm, unified dataclass description, and compatibility with vision-language models #44

Closed BabyChouSr closed 1 month ago

BabyChouSr commented 2 months ago

Three major changes:

  1. Add the usage of litellm as an api-type which reduces the burden of having to add API provider routers ourselves.
  2. Add dataclasses instead of using plain json to read in the yaml configs.
  3. Add compatibility to read in images for vision-language models.
CodingWithTim commented 2 months ago

@infwinston Let's talk about adding support for vision? This would be a major change for Arena Hard if we add support for this. Most of the people currently aren't using Arena-Hard for vision. I suggest using a separate python script file to do vision evaluation, and create a new folder for all the vision related works.

BabyChouSr commented 1 month ago

I will close this for now. Will re-open when this is ready to go.