I'm using a docker with the 12.1 nvidia/cuda container as a base. This worked perfectly for vllm unit the switch to using cupy. The cupy import breaks vllm whenever you use tensor-parallel >1. I've double checked and both the cuda version(12.1) and cupy(cupy-cuda12x) should be compatible. Any advice or guidance on this issue?
I'm using a docker with the 12.1 nvidia/cuda container as a base. This worked perfectly for vllm unit the switch to using cupy. The cupy import breaks vllm whenever you use tensor-parallel >1. I've double checked and both the cuda version(12.1) and cupy(cupy-cuda12x) should be compatible. Any advice or guidance on this issue?