Open kelvinwang139 opened 11 months ago
You can create two handles. Each handle for a model.
The latest release package uses the same stream for every handles which reduced the parallelism. And https://github.com/open-mmlab/mmdeploy/pull/2526 fixed it, therefore you build the latest code yourself.
You can create two handles. Each handle for a model.
The latest release package uses the same stream for every handles which reduced the parallelism. And #2526 fixed it, therefore you build the latest code yourself.
Question: Why the latest release package uses the same stream for every handles which reduced the parallelism? Is any considerations? If we use #2526 what should be considered for parallelism? Thanks for your supports!
Motivation
In some industry project, we need multiple models to hand multiple different defects type. Under this case, we need one GPU to make inference against different defects with related models in parallel to reduce the handling time.
But currently mmdeploy only support one context under same time to GPU? Is any way to make inference with different model engine in parallel?
Related resources
No response
Additional context
No response