lm-sys / arena-hard-auto

Arena-Hard-Auto: An automatic LLM benchmark.
Apache License 2.0
316 stars 29 forks source link

Evaluate local models #7

Closed xiamengzhou closed 2 months ago

xiamengzhou commented 2 months ago

Hi! Thanks for releasing this awesome benchmark :)

I was interested in evaluating this benchmark using local models that I have trained or models that are available on Hugging Face. From what I understand, it appears that I would need to develop the generation pipeline myself, possibly using tools like vLLM or similar services. Do I miss anything here?

CodingWithTim commented 2 months ago

Hey there! You would be correct. We currently only support generating answers to the prompt using API endpoints. However, you can set up an endpoint using vLLM. This process should be fairly simple.

infwinston commented 2 months ago

@xiamengzhou would a vLLM example in README help?