Distributed Inference error

when I exctute code llm = LLM("/chinese-alpaca-2-13b", tensor_parallel_size=1) just worked fine but when I change args like llm = LLM("/chinese-alpaca-2-13b", tensor_parallel_size=2) an error occurred in the code

torch.distributed.DistBackendError: NCCL error in: ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1275, internal error, NCCL version 2.15.5 ncclInternalError: Internal check failed. Last error: Duplicate GPU detected : rank 0 and rank 1 both on CUDA device 27000

and I tried two ways but did not work

1
import os
os.environ["CUDA_VISIBLE_DEVICES"] = str(device_id)

2
device_id = rank % torch.cuda.device_count()
torch.cuda.set_device(device_id)

how can I solve this problem?

vllm-project / vllm

Distributed Inference error #1593