triton-inference-server / tensorrtllm_backend

The Triton TensorRT-LLM Backend
Apache License 2.0
581 stars 81 forks source link

How to solve the problem of errors when loading qwen1.5-7B (using two GPUs) and llama3-8B (using two GPUs) models simultaneously using tritonserver? #510

Open ChengShuting opened 1 week ago

ChengShuting commented 1 week ago

System Info

env: NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 V100 16G*8 docker images: nvcr.io/nvidia/tritonserver:24.02-trtllm-python-py3

Who can help?

No response

Information

Tasks

Reproduction

command: mpirun -n 2 --allow-run-as-root tritonserver --model-control-mode=explicit --modelepository=/data/multi_model_repo/ --load-model=Qwen1.5-7B-Chat --load-model=Llama3-8B-Chinese-Chat

Expected behavior

图片

actual behavior

Two models can be loaded successfully, but when I call the Qwen1.5-7B Chat model using the openai interface, an error occurs

additional notes

no

byshiue commented 5 days ago

Is it necessary to launch these two models together? Could you use

tritonserver --model-control-mode=explicit --modelepository=/data/multi_model_repo/ --load-model=Qwen1.5-7B-Chat 
tritonserver --model-control-mode=explicit --modelepository=/data/multi_model_repo/ --load-model=Llama3-8B-Chinese-Chat

When you use mpirun to launch, it assumes that these two services use same model and require some communication (like TP or PP).

ChengShuting commented 5 days ago

Is it necessary to launch these two models together? Could you use

tritonserver --model-control-mode=explicit --modelepository=/data/multi_model_repo/ --load-model=Qwen1.5-7B-Chat 
tritonserver --model-control-mode=explicit --modelepository=/data/multi_model_repo/ --load-model=Llama3-8B-Chinese-Chat

When you use mpirun to launch, it assumes that these two services use same model and require some communication (like TP or PP).

When I executed these two commands, the second command reported an error: the port is already occupied. Can two models use the same port?

byshiue commented 3 days ago

No. They need to use different ports.