I found the judgement generation to be really time-consuming, where evaluating a model would cost more than 1.5 hours, using gpt-4-1106-preview and parallel count 8. Is this an expected behavior?
If yes, does gen_judgement support multi-threads generation on multiple api endpoints to balance the load? For example, if I have 5 endpoints, I can generate 40 judgements at the same time, which should significantly accelerate the evaluating process.
If it is now supported, please tell me how to perform this. Simply adding multiple endpoints for judge model in api_config.yaml seems not working.
I found that gen_judgement would randomly select an endpoints in the list, just increase the parallel count would accelerate t the generation, so I will close this issue.
I found the judgement generation to be really time-consuming, where evaluating a model would cost more than 1.5 hours, using gpt-4-1106-preview and parallel count 8. Is this an expected behavior?
If yes, does gen_judgement support multi-threads generation on multiple api endpoints to balance the load? For example, if I have 5 endpoints, I can generate 40 judgements at the same time, which should significantly accelerate the evaluating process.
If it is now supported, please tell me how to perform this. Simply adding multiple endpoints for judge model in api_config.yaml seems not working.
Much thanks.