Open ChengShuting opened 1 week ago
Is it necessary to launch these two models together? Could you use
tritonserver --model-control-mode=explicit --modelepository=/data/multi_model_repo/ --load-model=Qwen1.5-7B-Chat
tritonserver --model-control-mode=explicit --modelepository=/data/multi_model_repo/ --load-model=Llama3-8B-Chinese-Chat
When you use mpirun to launch, it assumes that these two services use same model and require some communication (like TP or PP).
Is it necessary to launch these two models together? Could you use
tritonserver --model-control-mode=explicit --modelepository=/data/multi_model_repo/ --load-model=Qwen1.5-7B-Chat tritonserver --model-control-mode=explicit --modelepository=/data/multi_model_repo/ --load-model=Llama3-8B-Chinese-Chat
When you use mpirun to launch, it assumes that these two services use same model and require some communication (like TP or PP).
When I executed these two commands, the second command reported an error: the port is already occupied. Can two models use the same port?
No. They need to use different ports.
System Info
env: NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 V100 16G*8 docker images: nvcr.io/nvidia/tritonserver:24.02-trtllm-python-py3
Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
command: mpirun -n 2 --allow-run-as-root tritonserver --model-control-mode=explicit --modelepository=/data/multi_model_repo/ --load-model=Qwen1.5-7B-Chat --load-model=Llama3-8B-Chinese-Chat
Expected behavior
actual behavior
Two models can be loaded successfully, but when I call the Qwen1.5-7B Chat model using the openai interface, an error occurs
additional notes
no