triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8.08k stars 1.45k forks source link

Triton ensemble pipeline high CPU usage #7007

Open sergeevii123 opened 6 months ago

sergeevii123 commented 6 months ago

Description I have a 5 steps ensemble pipeline for triton.

Expected behavior CPU usage is not increased when triton is running ensemble pipeline because all steps are inferenced on GPU.

indrajit96 commented 6 months ago

Can you share steps to repro along with the config files for the model

sergeevii123 commented 6 months ago

Sorry, it's a bit hard to come up with minimal example to reproduce this behaviour. Can you give some ideas how I can debug what steps in the ensemble pipeline are running on GPU and what on CPU?