Does FT supports serving multiple models concurrently?

triton-inference-server / fastertransformer_backend

BSD 3-Clause "New" or "Revised" License

411 stars 134 forks source link

Does FT supports serving multiple models concurrently? #35

Closed PKUFlyingPig closed 2 years ago

PKUFlyingPig commented 2 years ago

If there are multiple models in the model-repository, how will FT launch model instances? Say there are 4 GPUs in total, I launch the Bert and GPT model each with 1 instance, will they both be placed on the first GPU? Can I control the instance placement policy?

byshiue commented 2 years ago

You can choose which GPU by setting CUDA_VISIBLE_DEVICES during launching server.