Multi-threads generation support ?

I found the judgement generation to be really time-consuming, where evaluating a model would cost more than 1.5 hours, using gpt-4-1106-preview and parallel count 8. Is this an expected behavior?

If yes, does gen_judgement support multi-threads generation on multiple api endpoints to balance the load? For example, if I have 5 endpoints, I can generate 40 judgements at the same time, which should significantly accelerate the evaluating process.

If it is now supported, please tell me how to perform this. Simply adding multiple endpoints for judge model in api_config.yaml seems not working.

Much thanks.

lm-sys / arena-hard-auto

Multi-threads generation support ? #17