Open changshivek opened 3 months ago
GPU models and configuration: GPU 0: NVIDIA A800-SXM4-80GB GPU 1: NVIDIA A800-SXM4-80GB
You only have 2 GPUs, why use tensor parallel size = 4?
@youkaichao same question
If you use a tensor parallel size different from the number of GPUs you have, then this is indeed a known issue. https://github.com/vllm-project/vllm/pull/5473 should solve it.
If you use a tensor parallel size different from the number of GPUs you have, then this is indeed a known issue. #5473 should solve it.
No, I actually run vLLM on Kubernetes. Every time I modify the tensor parallel size, I manually adjust the number of GPUs simultaneously. The environment description shows only 2 GPUs because I copied it from another issue I had raised previously, where I encountered a similar problem on the same computing cluster. Therefore, I reused the environment description.
you can take a look at https://github.com/vllm-project/vllm/issues/6056
Your current environment
🐛 Describe the bug
I use vllm/vllm-openai:v0.5.0 on k8s to deploy qwen 2 72b instruct, with tensor parallel size = 4, args looks like:
then I got the following error:
This same config works normally with vllm/vllm-openai:v0.4.3. I tried to set tensor parallel size = 8, then I got a bunch of exceptions like #5439 ,and it takes very long time to launch, I did not wait to see if it starts successfully.