Triton ensemble pipeline high CPU usage

sergeevii123 commented 7 months ago

Description I have a 5 steps ensemble pipeline for triton.

3 steps are torchscript artifacts
2 steps are tensorrt compiled models in pbtxts files I have
```
instance_group [{ kind: KIND_GPU }]
parameters: {
key: "FORCE_CPU_ONLY_INPUT_TENSORS"
value: {
string_value:"no"
}
}
```
For every torchscripted step but from looking at CPU usage metrics - they are very high, so it feels that some of the steps are actually running on CPU. Or some of the data is moved between CPU and GPU. Could it be the case? How can I double check what device is used for running every step in ensemble pipeline? Triton Information I've build triton docker image based on version 23.02 with this
```
git clone -b r23.02 https://github.com/triton-inference-server/server.git &&
  cd server &&
  python compose.py --backend tensorrt --backend pytorch 
```

Expected behavior CPU usage is not increased when triton is running ensemble pipeline because all steps are inferenced on GPU.

indrajit96 commented 7 months ago

Can you share steps to repro along with the config files for the model

sergeevii123 commented 7 months ago

Sorry, it's a bit hard to come up with minimal example to reproduce this behaviour. Can you give some ideas how I can debug what steps in the ensemble pipeline are running on GPU and what on CPU?

triton-inference-server / server

Triton ensemble pipeline high CPU usage #7007