triton-inference-server / fastertransformer_backend

BSD 3-Clause "New" or "Revised" License
411 stars 133 forks source link

How to deploy multiple model in a node with multople GPUs #165

Open jjjjohnson opened 12 months ago

jjjjohnson commented 12 months ago

Description

Suppose I have 5 GPT models with each TP=2 and I want to deploy them in a machine with 8 GPUs.  Is it possible? If so, how to control the GPU allocation? I tried to set CUDA_VISIBLE_DEVICES when launch the Triton server does not work.

Reproduced Steps

Tried CUDA_VISIBLE_DEVICES