triton-inference-server / tensorrtllm_backend

The Triton TensorRT-LLM Backend
Apache License 2.0
683 stars 98 forks source link

is support multi node in triton inference server? #75

Open amazingkmy opened 11 months ago

amazingkmy commented 11 months ago

is support multi node in triton inference server?

i build llama-7b for tensorrtllm_backend and execute triton inference server i have a 4 GPUS but triton inference server load only 1 GPUS

image nvcr.io/nvidia/tritonserver:23.10-trtllm-python-py3

build (llama2)

python build.py --model_dir ${model_directory} \
                --dtype float16 \
                --use_gpt_attention_plugin bfloat16 \
                --use_inflight_batching \
                --paged_kv_cache \
                --remove_input_padding \
                --use_gemm_plugin float16 \
                --output_dir engines/fp16/1-gpu/

run

tritonserver --model-repo=/tensorrtllm_backend/triton_model_repo --disable-auto-complete-config
image
amazingkmy commented 11 months ago

i think that mpi rank is not working properly and seems to be stuck at 0.

image

I got the same result by replacing the command with the following

mpirun -n 2 --allow-rum-as-root tritonserver --model-repo=/tensorrtllm_backend/triton_model_repo --disable-auto-complete-config
byshiue commented 11 months ago

Do you want to ask multi-node or multi-gpu? From your description, you test on multi-gpu. So, I am a little confused.

Besides, can you share the error log of your second test about

mpirun -n 2 --allow-rum-as-root tritonserver --model-repo=/tensorrtllm_backend/triton_model_repo --disable-auto-complete-config
amazingkmy commented 11 months ago

@byshiue i want to ask 'multi-gpu'

byshiue commented 11 months ago

Can you share the error log of your second test about

mpirun -n 2 --allow-rum-as-root tritonserver --model-repo=/tensorrtllm_backend/triton_model_repo --disable-auto-complete-con