Open Bycqg opened 1 month ago
Initially, I used dali.py
and configured a dali_backend
model in the ensemble model to preprocess images. However, with this configuration, if I uploaded a file in an incorrect format (e.g., GIF), DALI could not decode it, leading to the Triton service being killed.
Now, I have switched to using python_backend
and wrote DALI preprocessing in model.py
, using try...except to handle exceptions and prevent the Triton service from being killed. However, in this process, I found that the pb_utils.Tensor()
function only supports parameters of type (str, numpy.ndarray). This forces me to transfer DALI GPU data back to the CPU. My intention was to directly transfer data from DALI GPU to TensorRT GPU, as I believe this would be more efficient. I would like to ask if DALI GPU data must be transferred back to the CPU before being passed on. If not, how can I achieve this (preferably with code examples or documentation)?
I have configured an ensemble model in Triton Inference Server, which includes DALI preprocessing and TensorRT inference. When I uploaded a GIF image from the client, the Triton server crashed with the error "current pipeline object is no longer valid. killed" because DALI does not support GIF decoding. How can I prevent Triton from shutting down, and instead catch the exception and return a proper error response?
On a related note, I want to ensure that all data processing remains on the GPU throughout the entire pipeline (i.e., data processed by DALI on the GPU should not be transferred back to the CPU before being passed to TensorRT for inference). I believe that keeping the data on the GPU will be more efficient. Is this possible, and if so, how can it be achieved?
The version I am using is: Release 2.29.0 corresponding to NGC container 22.12.