Open ovunctuzel-bc opened 6 months ago
I think I was able to isolate the issue to the LiteMLA block, which causes large values as a result of a matrix multiplications. The max values are around 2e5 which is larger than the max FP16 value.
Interestingly this does not happen with the provided pretrained models.
I was able to resolve the problem by setting the following layer precisions to FP32 using the python tensorrt API
/backbone/stages.2/op_list.1/context_module/main/MatMul
/backbone/stages.2/op_list.1/context_module/main/MatMul_1
/backbone/stages.2/op_list.1/context_module/main/Slice_5
/backbone/stages.2/op_list.1/context_module/main/Slice_4
/backbone/stages.2/op_list.1/context_module/main/Add
/backbone/stages.2/op_list.1/context_module/main/Div
(Repeat for other stages/op_list combinations)
I'm trying to run FP16 inference using TensorRT 8.5.2.2 on a Xavier NX device, and getting NaN or garbage values. Has anyone encountered a similar issue?
- I'm using B0 and B1 segmentation models (custom trained).
- The ONNX model works great. Even tried FP16 ONNX inference, works great.
- TensorRT with FP32 precision works great.
- I have tried exporting with the python API and trtexec, results are the same.
Where is the inference script for the thing which u tried? Im here referring to SEG variant.
I'm using a propriety script but you can look at NVIDIA's examples. Running TRT models is usually the same regardless of model architecture, as long as the inputs and outputs are set properly.
https://github.com/NVIDIA/object-detection-tensorrt-example/blob/master/SSD_Model/utils/inference.py
I'm using a propriety script but you can look at NVIDIA's examples. Running TRT models is usually the same regardless of model architecture, as long as the inputs and outputs are set properly.
https://github.com/NVIDIA/object-detection-tensorrt-example/blob/master/SSD_Model/utils/inference.py
Thanks a lot @ovunctuzel-bc for ur timely reply. Is there any proper semantic segmentation tensorrrt inference script you referred? The above script redirects to object detection use case. If you could refer me some then it would be very helpful for me. I have currently tried with Semantic variant b2 model. I have even converted to onnx and trt. Just not getting good efficient resource for trying out tensorrt inference.
I'm trying to run FP16 inference using TensorRT 8.5.2.2 on a Xavier NX device, and getting NaN or garbage values. Has anyone encountered a similar issue?