Unable to launch Triton Server Across Multi Nodes

jianqiylz commented 6 months ago

Hello, When trying to run the tritonserver on a setup with 4 nodes, I face an failure that seems to suggest a mismatch between the number of GPUs per node and the tensor parallel (TP) * pipeline parallel (PP) sizes. The error message is as follows:

+------------------+---------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Model            | Version | Status                                                                                                                                                                             |
+------------------+---------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| postprocessing   | 1       | READY                                                                                                                                                                              |
| preprocessing    | 1       | READY                                                                                                                                                                              |
| tensorrt_llm     | 1       | UNAVAILABLE: Internal: unexpected error when creating modelInstanceState: [TensorRT-LLM][ERROR] Assertion failed: Number of GPUs per node 1 must be at least as large as TP (4) *  |
|                  |         | PP (1) (/app/tensorrt_llm/cpp/tensorrt_llm/runtime/worldConfig.cpp:74)                                                                                                             |
|                  |         | 1       0x7f5c8b4936fd /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x166fd) [0x7f5c8b4936fd]                                                                  |
|                  |         | 2       0x7f5c8b4aaa82 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x2da82) [0x7f5c8b4aaa82]                                                                  |
|                  |         | 3       0x7f5c8b5ad528 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x130528) [0x7f5c8b5ad528]                                                                 |
|                  |         | 4       0x7f5c8b4eeac8 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x71ac8) [0x7f5c8b4eeac8]                                                                  |
|                  |         | 5       0x7f5c8b4e4e08 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x67e08) [0x7f5c8b4e4e08]                                                                  |
|                  |         | 6       0x7f5c8b4c614e /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x4914e) [0x7f5c8b4c614e]                                                                  |
|                  |         | 7       0x7f5c8b4c7242 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x4a242) [0x7f5c8b4c7242]                                                                  |
|                  |         | 8       0x7f5c8b4b6dd5 TRITONBACKEND_ModelInstanceInitialize + 101                                                                                                                 |
|                  |         | 9       0x7f5d06d9aa86 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x1a4a86) [0x7f5d06d9aa86]                                                                                 |
|                  |         | 10      0x7f5d06d9bcc6 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x1a5cc6) [0x7f5d06d9bcc6]                                                                                 |
|                  |         | 11      0x7f5d06d7ec15 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x188c15) [0x7f5d06d7ec15]                                                                                 |
|                  |         | 12      0x7f5d06d7f256 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x189256) [0x7f5d06d7f256]                                                                                 |
|                  |         | 13      0x7f5d06d8b27d /opt/tritonserver/bin/../lib/libtritonserver.so(+0x19527d) [0x7f5d06d8b27d]                                                                                 |
|                  |         | 14      0x7f5d063f9ee8 /lib/x86_64-linux-gnu/libc.so.6(+0x99ee8) [0x7f5d063f9ee8]                                                                                                  |
|                  |         | 15      0x7f5d06d7597b /opt/tritonserver/bin/../lib/libtritonserver.so(+0x17f97b) [0x7f5d06d7597b]                                                                                 |
|                  |         | 16      0x7f5d06d85695 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x18f695) [0x7f5d06d85695]                                                                                 |
|                  |         | 17      0x7f5d06d8a50b /opt/tritonserver/bin/../lib/libtritonserver.so(+0x19450b) [0x7f5d06d8a50b]                                                                                 |
|                  |         | 18      0x7f5d06e73610 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x27d610) [0x7f5d06e73610]                                                                                 |
|                  |         | 19      0x7f5d06e76d03 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x280d03) [0x7f5d06e76d03]                                                                                 |
|                  |         | 20      0x7f5d06fc38b2 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x3cd8b2) [0x7f5d06fc38b2]                                                                                 |
|                  |         | 21      0x7f5d06664253 /lib/x86_64-linux-gnu/libstdc++.so.6(+0xdc253) [0x7f5d06664253]                                                                                             |
|                  |         | 22      0x7f5d063f4ac3 /lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7f5d063f4ac3]                                                                                                  |
|                  |         | 23      0x7f5d06486a40 /lib/x86_64-linux-gnu/libc.so.6(+0x126a40) [0x7f5d06486a40]                                                                                                 |
| tensorrt_llm_bls | 1       | READY                                                                                                                                                                              |
+------------------+---------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Environment Information

Nodes: 4 nodes, each equipped with One NVIDIA A30 24GB GPU
Networking: Each node has a 100G Mellanox network card supporting RoCE, with all nodes interconnected through a 100G switch
NCCL Test: Passed without any anomalies
CUDA Version: Build cuda_12.2.r12.2/compiler.33191640_0

Steps to Reproduce

Built the Docker image using the following commands:

cd tensorrtllm_backend
git lfs install
git submodule update --init --recursive
DOCKER_BUILDKIT=1 docker build -t triton_trt_llm -f dockerfile/Dockerfile.trt_llm_backend .

Compiled the TensorRT engine within a container launched from the above image:

cd tensorrt_llm
python build.py --model_dir ./Yi-34B-Chat/ --dtype bfloat16 \
    --remove_input_padding --use_gpt_attention_plugin bfloat16 --use_gemm_plugin bfloat16 \
    --output_dir ./Yi-34B-Chat-out/bf16/4-gpu/  --rotary_base 1000000 --vocab_size 64000 --world_size 4 --tp_size 4 --gpus_per_node=1 \
    --enable_context_fmha --max_input_len 15360 --max_output_len 2048 --max_batch_size 4

Before running the tests, I administered a workaround in the mapping.py file to align with my hardware:

vim /usr/local/lib/python3.10/dist-packages/tensorrt_llm/mapping.py
# Changed line 38 from "gpus_per_node=8" to "gpus_per_node=1"

Tested the multi-node execution with run.py. Ran the test with the following command which worked fine and inference results were obtained successfully:

mpirun -np 4 --allow-run-as-root \
    --hostfile /tensorrtllm_backend/hostfile \
    -x NCCL_DEBUG=INFO -x LD_LIBRARY_PATH -x PATH \
    -mca pml ucx  \
    python ../run.py --log_level="info" --max_output_len=40 --max_input_length=256 --tokenizer_dir ./Yi-34B-Chat/ \
    --engine_dir ./Yi-34B-Chat-out/bf16/4-gpu/ --input_text "what is llama?"

Encountered the issue when starting the Triton server with the below command:

mpirun -np 4 --allow-run-as-root \
    --hostfile /tensorrtllm_backend/hostfile \
    -x NCCL_DEBUG=INFO -x LD_LIBRARY_PATH -x PATH \
    -mca pml ucx  \
    /opt/tritonserver/bin/tritonserver --model-repo=/tensorrtllm_backend/triton_model_repo --disable-auto-complete-config

From what I could observe, it seems like the tritonserver may not yet support run across multiple nodes. If my understanding is correct, could you please consider adding support for this functionality, or provide any guidance on how I can work around this limitation?

Sincerely appreciate any assistance you can offer on this matter.

jianqiylz commented 6 months ago

@juney-nvidia Hello, could you please take a look at this issue? If you need any further information, please let me know.

jianqiylz commented 6 months ago

@byshiue @juney-nvidia pls

jfpichlme commented 2 weeks ago

Any updates?

datdo-msft commented 2 weeks ago

Hi @byshiue @juney-nvidia , did you get the chance to look into this?

@jianqiylz , wanted to check in, were you able to resolve this?

triton-inference-server / tensorrtllm_backend

Unable to launch Triton Server Across Multi Nodes #283

Environment Information

Steps to Reproduce