Could mmdeploy support multiple model inference in parallel at the same GPU (TensorRT)？

open-mmlab / mmdeploy

OpenMMLab Model Deployment Framework

https://mmdeploy.readthedocs.io/en/latest/

Apache License 2.0

2.79k stars 639 forks source link

Could mmdeploy support multiple model inference in parallel at the same GPU (TensorRT)？ #2596

Open kelvinwang139 opened 11 months ago

kelvinwang139 commented 11 months ago

Motivation

In some industry project, we need multiple models to hand multiple different defects type. Under this case， we need one GPU to make inference against different defects with related models in parallel to reduce the handling time.

But currently mmdeploy only support one context under same time to GPU? Is any way to make inference with different model engine in parallel?

Related resources

No response

Additional context

No response

irexyc commented 11 months ago

You can create two handles. Each handle for a model.

The latest release package uses the same stream for every handles which reduced the parallelism. And https://github.com/open-mmlab/mmdeploy/pull/2526 fixed it, therefore you build the latest code yourself.

kelvinwang139 commented 1 week ago

You can create two handles. Each handle for a model.

The latest release package uses the same stream for every handles which reduced the parallelism. And #2526 fixed it, therefore you build the latest code yourself.

Question: Why the latest release package uses the same stream for every handles which reduced the parallelism? Is any considerations? If we use #2526 what should be considered for parallelism? Thanks for your supports!