lm-sys / arena-hard-auto

Arena-Hard-Auto: An automatic LLM benchmark.
Apache License 2.0
316 stars 29 forks source link

Multi-threads generation support ? #17

Closed Ignoramus0817 closed 2 months ago

Ignoramus0817 commented 2 months ago

I found the judgement generation to be really time-consuming, where evaluating a model would cost more than 1.5 hours, using gpt-4-1106-preview and parallel count 8. Is this an expected behavior?

If yes, does gen_judgement support multi-threads generation on multiple api endpoints to balance the load? For example, if I have 5 endpoints, I can generate 40 judgements at the same time, which should significantly accelerate the evaluating process.

If it is now supported, please tell me how to perform this. Simply adding multiple endpoints for judge model in api_config.yaml seems not working.

Much thanks.

Ignoramus0817 commented 2 months ago

I found that gen_judgement would randomly select an endpoints in the list, just increase the parallel count would accelerate t the generation, so I will close this issue.