peer access is not supported between these two devices

When upgrading from sglang 0.1.16 to 0.1.17 I get the following error when loading a model with tp=2 on a 2xT4 machine (kaggle). The same code used to work on 0.1.16

Error:

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.

Failed: Cuda error /home/runner/work/vllm/vllm/csrc/custom_all_reduce.cuh:307 'peer access is not supported between these two devices'
Failed: Cuda error /home/runner/work/vllm/vllm/csrc/custom_all_reduce.cuh:307 'peer access is not supported between these two devices'

[rank1]:[W CudaIPCTypes.cpp:16] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
[rank0]:[W CudaIPCTypes.cpp:16] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]

[...]

Code:

runtime = sgl.Runtime(model_path=model_name, tp_size=2)

This used to run fine in 0.1.16 on the same machine. The model loaded is deepseek-7b, so llamaforcausal family. Let me know if you want me to test with other models.

sgl-project / sglang

peer access is not supported between these two devices #552