SGLang is a structured generation language designed for large language models (LLMs). It makes your interaction with models faster and more controllable.
Apache License 2.0
2.75k
stars
176
forks
source link
peer access is not supported between these two devices #552
When upgrading from sglang 0.1.16 to 0.1.17 I get the following error when loading a model with tp=2 on a 2xT4 machine (kaggle). The same code used to work on 0.1.16
Error:
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Failed: Cuda error /home/runner/work/vllm/vllm/csrc/custom_all_reduce.cuh:307 'peer access is not supported between these two devices'
Failed: Cuda error /home/runner/work/vllm/vllm/csrc/custom_all_reduce.cuh:307 'peer access is not supported between these two devices'
[rank1]:[W CudaIPCTypes.cpp:16] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
[rank0]:[W CudaIPCTypes.cpp:16] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
[...]
This used to run fine in 0.1.16 on the same machine. The model loaded is deepseek-7b, so llamaforcausal family. Let me know if you want me to test with other models.
When upgrading from sglang 0.1.16 to 0.1.17 I get the following error when loading a model with tp=2 on a 2xT4 machine (kaggle). The same code used to work on 0.1.16
Error:
Code:
This used to run fine in 0.1.16 on the same machine. The model loaded is deepseek-7b, so llamaforcausal family. Let me know if you want me to test with other models.