wang-xinyu / tensorrtx

Implementation of popular deep learning networks with TensorRT network definition API
MIT License
6.98k stars 1.77k forks source link

tensorrtx/resnet/resnet50.py quantization to FP32, FP16, Int8 #1239

Closed woonwoon closed 1 year ago

woonwoon commented 1 year ago

I want resnet50.wts quantization to fp32, fp16, int8 at resnet50.py. How do I modify resnet50.py to do each?

wang-xinyu commented 1 year ago

You can call the setFlag API to use fp16 or int8(int8 requires calibration). Here is a c++ code, you can find the corresponding python api in tensorrt doc.

https://github.com/wang-xinyu/tensorrtx/blob/f92dcf43dcbe346c357edfa4cc976eb9d0d95470/yolov5/src/model.cpp#L351

woonwoon commented 1 year ago

I just want to quantization resnet in classification problems for each data type(FP32, FP16, Int8). in python code https://github.com/wang-xinyu/tensorrtx/blob/master/resnet/resnet50.py

wang-xinyu commented 1 year ago

You need to call the config.setflag(FP16) API before calling build_engine(), the tensort build_engine() function will do the quantization internally.

https://github.com/wang-xinyu/tensorrtx/blob/f92dcf43dcbe346c357edfa4cc976eb9d0d95470/resnet/resnet50.py#L225

The .wts contains the FP32 weights, the tensort build_engine() function will do the quantization internally, you don't need to convert the weights by yourself, just need to setFlag.

woonwoon commented 1 year ago

Thanks to that, I converted it to float16. However, the conversion to int8 requires an calibration, but I don't know how to add it. I'd appreciate it if you could tell me

i add this : config.set_flag(trt.BuilderFlag.INT8)

error message : [TensorRT] WARNING: Calibrator is not being used. Users must provide dynamic range for all tensors that are not Int32. [TensorRT] ERROR: Calibration failure occurred with no scaling factors detected. This could be due to no int8 calibrator or insufficient custom scales for network layers. Please see int8 sample to setup calibration correctly.

wang-xinyu commented 1 year ago

TensorRT INT8 PTQ requires calibration, it's a bit tricky to implement, you can refer to the yolov5 in this repo.

In addition to the setFlag(INT8), basically you need to implement a Calibration class. You can also check out the NVIDIA TensorRT doc for this.

woonwoon commented 1 year ago

I can quantizate to int8, fp16. thank you. There is one question. If I run without adding 'set_flag', will it be FP32 -> FP32? So what is done?

wang-xinyu commented 1 year ago

Yes, it will use FP32 by default.

woonwoon commented 1 year ago

It's not quantized, is it?

wang-xinyu commented 1 year ago

It's not quantized

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.