triton-inference-server / tensorrtllm_backend

The Triton TensorRT-LLM Backend
Apache License 2.0
664 stars 96 forks source link

Example `gpu_device_ids` for multi-model usage? #448

Open ghost opened 4 months ago

ghost commented 4 months ago

System Info

P4D (A100 40 GB x 8)

Who can help?

@juney-nvidia @byshiue

Information

Tasks

Reproduction

I want to use multi-model TP-4 x 2 setup. How do I set gpu_device_ids to [[0,1,2,3],[4,5,6,7]] Could you please provide an example usage of gpu_device_ids?

Expected behavior

n/a

actual behavior

n/a

additional notes

n/a

ghost commented 4 months ago

How do I use an ensemble which load balances across these 2 model instances?