mit-han-lab / efficientvit

Efficient vision foundation models for high-resolution generation and perception.

Apache License 2.0

2.42k stars 195 forks source link

NaN values with FP16 TensorRT Inference #116

Open ovunctuzel-bc opened 6 months ago

ovunctuzel-bc commented 6 months ago

I'm trying to run FP16 inference using TensorRT 8.5.2.2 on a Xavier NX device, and getting NaN or garbage values. Has anyone encountered a similar issue?

I'm using B0 and B1 segmentation models (custom trained).
The ONNX model works great. Even tried FP16 ONNX inference, works great.
TensorRT with FP32 precision works great.
I have tried exporting with the python API and trtexec, results are the same.

bernardrb commented 6 months ago

113 Facing a similar issue. You may follow the issue on tensorrt linked in my post

ovunctuzel-bc commented 6 months ago

I think I was able to isolate the issue to the LiteMLA block, which causes large values as a result of a matrix multiplications. The max values are around 2e5 which is larger than the max FP16 value.

Interestingly this does not happen with the provided pretrained models.

ovunctuzel-bc commented 6 months ago

I was able to resolve the problem by setting the following layer precisions to FP32 using the python tensorrt API

/backbone/stages.2/op_list.1/context_module/main/MatMul
/backbone/stages.2/op_list.1/context_module/main/MatMul_1
/backbone/stages.2/op_list.1/context_module/main/Slice_5
/backbone/stages.2/op_list.1/context_module/main/Slice_4
/backbone/stages.2/op_list.1/context_module/main/Add
/backbone/stages.2/op_list.1/context_module/main/Div

(Repeat for other stages/op_list combinations)

Sanath1998 commented 3 months ago

I'm trying to run FP16 inference using TensorRT 8.5.2.2 on a Xavier NX device, and getting NaN or garbage values. Has anyone encountered a similar issue?

I'm using B0 and B1 segmentation models (custom trained).

The ONNX model works great. Even tried FP16 ONNX inference, works great.

TensorRT with FP32 precision works great.

I have tried exporting with the python API and trtexec, results are the same.

Where is the inference script for the thing which u tried? Im here referring to SEG variant.

ovunctuzel-bc commented 3 months ago

I'm using a propriety script but you can look at NVIDIA's examples. Running TRT models is usually the same regardless of model architecture, as long as the inputs and outputs are set properly.

https://github.com/NVIDIA/object-detection-tensorrt-example/blob/master/SSD_Model/utils/inference.py

Sanath1998 commented 3 months ago

I'm using a propriety script but you can look at NVIDIA's examples. Running TRT models is usually the same regardless of model architecture, as long as the inputs and outputs are set properly.

https://github.com/NVIDIA/object-detection-tensorrt-example/blob/master/SSD_Model/utils/inference.py

Thanks a lot @ovunctuzel-bc for ur timely reply. Is there any proper semantic segmentation tensorrrt inference script you referred? The above script redirects to object detection use case. If you could refer me some then it would be very helpful for me. I have currently tried with Semantic variant b2 model. I have even converted to onnx and trt. Just not getting good efficient resource for trying out tensorrt inference.