lm-sys / arena-hard-auto

Arena-Hard-Auto: An automatic LLM benchmark.
Apache License 2.0
316 stars 29 forks source link

Can you add deepseek-coder-v2? #29

Closed Kreijstal closed 2 weeks ago

Kreijstal commented 2 weeks ago

afaik is the best open source model, no? Also I would like to see claude 3.5 gpt4o and qwen2

CodingWithTim commented 2 weeks ago

We will add these models and release a official leaderboard very soon. In the meantime, you can look up their Arena-Hard score on their blogpost. DeepSeek-Coder-v2 and Qwen2-72B-Instruct both mentioned their Arena-Hard score in their release notes.