Open pcmoritz opened 8 months ago
I think this is because without specifying a batch size the harness defaults to bs 1. Should be fixed if you use --batch_size auto
and we can take advantage of vLLM's continuous batching.
@pcmoritz did you solve this? Im facing similar issue.
Since https://github.com/vllm-project/vllm/pull/3065, the eval suite https://github.com/EleutherAI/lm-evaluation-harness is broken.
Repro (this should be run on 2 A100s or H100s to make sure the Mixtral model fits into GPU memory):
This fails with
The API breakage is fixed in https://github.com/EleutherAI/lm-evaluation-harness/pull/1549, but after the fix it is extremely slow (about 40x slower than before), so not really feasible to run:
Being able to run the evaluation harness in a timely manner is crucial so we can ensure model performance doesn't degrade.