Closed Tpoc311 closed 2 months ago
The problem was not on Triton side. The operations for preprocessing on DALI were taken based on the test pipeline:
But as it turned out, one operation is missing here – normalization. And I didn't notice it because it was defined elsewhere – in the data preprocessor:
After adding normalization to the DALI pipeline, everything began to work as it should.
The config is located here
Description
The problem with the inference of the classifier model (HRNet W30) via the Triton Inference Server (of just Triton). The model was trained for 2 classes using MMPretrain (PyTorch underneath is used) framework. Then the weights are converted using the deploy.py script from MMDeploy framework into TensorRT format and were running inside Triton. Model weights worked correctly before conversion. After converting the weights into TensorRT format, the correct classification is also given through the test script from MMPretrain github - image_demo.py.
But when I try to get predicts via Triton server (using asynchronous requests), the model always returns a constant predicts. Its result through triton is always 1 confidence for the first class and 0 for the second.
To inference the Triton server, the original code from Nvidia's examples is used - simple_grpc_async_infer_client.py, just upgraded to operate with image. (see “The modified inference script”).
Triton Information I’m using Triton inside Docker container with base image –
nvcr.io/nvidia/tritonserver:23.02-py3
Expected behavior The model should produce correct predictions for classes (at least non-constant)
Triton configs
Classifier config:
Preprocessing config:
Ensemble:
Preprocessing DALI pipeline
Docker images Image used for conversion:
The modified inference script