Closed ggbetz closed 3 months ago
see also https://github.com/vllm-project/vllm/issues/787
There's an env variable to fix that, so we have to add, e.g.
VLLM_SWAP_SPACE=6
To the configuration (e.g., config.env
)
The problem seems to arise when traces are generated with beam search. For clarification: https://github.com/vllm-project/vllm/issues/2853
This has been resolved.
Pipeline ran successfully for the NousResearch/Nous-Hermes-llama-2-7b
model with latest docker container.
Increasing the VLLM_SWAP_SPACE, i.e., the CPU memory vllm may use for offloading during beam search, resolves this issue.
Details:
ntasks
parameter. Before, I got OOM (not cuda oom) error.VLLM_SWAP_SPACE=32
when evaluating microsoft/orca-7b