Open sergeevii123 opened 7 months ago
Can you share steps to repro along with the config files for the model
Sorry, it's a bit hard to come up with minimal example to reproduce this behaviour. Can you give some ideas how I can debug what steps in the ensemble pipeline are running on GPU and what on CPU?
Description I have a 5 steps ensemble pipeline for triton.
For every torchscripted step but from looking at CPU usage metrics - they are very high, so it feels that some of the steps are actually running on CPU. Or some of the data is moved between CPU and GPU. Could it be the case? How can I double check what device is used for running every step in ensemble pipeline? Triton Information I've build triton docker image based on version 23.02 with this
Expected behavior CPU usage is not increased when triton is running ensemble pipeline because all steps are inferenced on GPU.