microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.81k stars 2.94k forks source link

bad accuracy on own quantized(INT8) yolov3 model #9598

Open tingggggg opened 3 years ago

tingggggg commented 3 years ago

I refer example (https://github.com/microsoft/onnxruntime-inference-examples/tree/main/quantization/object_detection/trt/yolov3) to do quantization on own yolo model. But I got the bad result on quantized model.

I test to change both QuantFormat and QuantType but still got the bad result. Model can not detect anything after quantized.

Can someone give me some tricks or tips to convert the yolo model?

def get_calibration_table(model_path, augmented_model_path, calibration_dataset):

    calibrator = create_calibrator(model_path, [], augmented_model_path=augmented_model_path, calibrate_method=CalibrationMethod.Entropy)

    # DataReader can handle dataset with batch or serial processing depends on its implementation
    # Following examples show two different ways to generate calibration table
    '''
    1. Use serial processing

    We can use only one DataReader to do serial processing, however,
    some machines don't have sufficient memory to hold all dataset images and all intermediate output.
    So let multiple DataReader do handle different stride of dataset one by one.
    DataReader will use serial processing when batch_size is 1.
    '''

    # total_data_size = len(os.listdir(calibration_dataset))
    # start_index = 0
    # stride = 2000
    # for i in range(0, total_data_size, stride):
    #     data_reader = YoloV3DataReader(calibration_dataset,
    #                                    start_index=start_index,
    #                                    end_index=start_index + stride,
    #                                    stride=stride,
    #                                    batch_size=1,
    #                                    model_path=augmented_model_path)
    #     calibrator.collect_data(data_reader)
    #     start_index += stride
    '''
    2. Use batch processing (much faster)

    Batch processing requires less memory for intermediate output, therefore let only one DataReader to handle dataset in batch. 
    However, if encountering OOM, we can make multiple DataReader to do the job just like serial processing does. 
    DataReader will use batch processing when batch_size > 1.
    '''

    data_reader = YoloV3DataReader(calibration_dataset, width=512, height=512, stride=1000, batch_size=1, model_path=augmented_model_path)
    calibrator.collect_data(data_reader)

    write_calibration_table(calibrator.compute_range())
    print('calibration table generated and saved.')

def quant(input_model_path, output_model_path, calibration_dataset_path, augmented_model_path):

    data_reader = YoloV3DataReader(calibration_dataset, width=512, height=512, stride=1000, batch_size=1, model_path=augmented_model_path)
    quant_format = QuantFormat.QDQ
    per_channel = False
    quantize_static(input_model_path,
                    output_model_path,
                    data_reader,
                    quant_format=quant_format,
                    per_channel=per_channel,
                    weight_type=QuantType.QUInt8)
    print('Calibrated and quantized model saved.')

augmented_model_path = 'augmented_model.onnx'
quant_model_path = "own_yolo_quant_static.onnx"
model_path = 'own_yolo.onnx'
get_calibration_table(model_path, augmented_model_path, calibration_dataset)
quant(model_path, quant_model_path, calibration_dataset, augmented_model_path)
yufenglee commented 3 years ago

@tingggggg, could you provide more info, like how big is your calibration dataset, which model are you using?

tingggggg commented 3 years ago

@yufenglee , thank u for your help. I use 100 images to calibrate. Is there a standard that requires several pictures to be enough? I use small yolov3 for 5 class and only 2 output (32x32, 16x16)

Can I do calibration with coco dataset if model trained with image of actual application scenarios? (same category)

yufenglee commented 3 years ago

@tingggggg,

@yufenglee , thank u for your help. I use 100 images to calibrate. Is there a standard that requires several pictures to be enough? I use small yolov3 for 5 class and only 2 output (32x32, 16x16)

Can I do calibration with coco dataset if model trained with image of actual application scenarios? (same category)

@tingggggg, no, you need to use the dataset with same characteristic as training dataset to calibrate. And if you don't run with TensorRT ep, you don't need to call get_calibration_table(model_path, augmented_model_path, calibration_dataset).

100 images is a little bit small. In general, we use 500 and more to do calibrate.

Other option you can try is to try the Entropy calibration method: https://github.com/microsoft/onnxruntime/blob/c6ef6b5bc8eeb49a6b8f6fadfd212c9e07ddde31/onnxruntime/python/tools/quantization/quantize.py#L149

tingggggg commented 3 years ago

@yufenglee, after done some test. I find out that i need to add param nodes_to_exclude for function quantize_static and quantize only for Conv node to avoid bad accuracy. I also find out size of yolov3 model that doing this quantize operation be reduce, but inference time slow down. Do I quantizing wrong? e.g.

op_types_to_quantize = ["Conv"]
nodes_to_exclude = ['conv0', 'conv1', 'conv2', 'conv3', 'conv4'] # exclude some node from head
quantize_static(..., op_types_to_quantize, nodes_to_exclude=nodes_to_exclude, )
yufenglee commented 3 years ago

@tingggggg, partial quantization introduces interleaving Dequantize and Quantize and makes latency bad. I would recommend to retrain the model with Quantized Aware Training tech to get the accuracy back. Could you share me your model with conv quantized only?

stale[bot] commented 2 years ago

This issue has been automatically marked as stale due to inactivity and will be closed in 7 days if no further activity occurs. If further support is needed, please provide an update and/or more details.