meituan / YOLOv6

YOLOv6: a single-stage object detection framework dedicated to industrial applications.
GNU General Public License v3.0
5.71k stars 1.03k forks source link

PTQ/QAT Unusual Behaviour #558

Closed haritsahm closed 1 year ago

haritsahm commented 2 years ago

Before Asking

Search before asking

Question

After training the model following the guide in tutorial_repopt, I got an unusual behaviour when running the PTQ and QAT training steps, similar to what I reported in 535#issuecomment-1284808881.

Notes:

Commands

### PTQ
PYTHONWARNINGS="ignore" python tools/train.py --data data/custom-data.yaml --name yolov6s-repopt-custom-data-ptq --conf configs/repopt/yolov6s_opt_qat-custom-data.py --calib --quant --batch 32 --workers 14 --device 0 --check-images --check-labels

### QAT
CUDA_LAUNCH_BLOCKING=1 PYTHONWARNINGS="ignore" python tools/train.py --data data/custom-data.yaml --name yolov6s-repopt-custom-data-qat --conf configs/repopt/yolov6s_opt_qat-custom-data.py --quant --distill --distill_feat --batch 32 --workers 14 --epochs 10 --teacher_model_path runs/train/yolov6s-repopt-custom-data/weights/best_ckpt.pt --device 0 --check-images --check-labels

Configs

ptq = dict(
    num_bits = 8,
    calib_batches = 4,
    # 'max', 'histogram'
    calib_method = 'histogram',
    # 'entropy', 'percentile', 'mse'
    histogram_amax_method='entropy',
    histogram_amax_percentile=99.99,
    calib_output_path='weights/',
    sensitive_layers_skip=False,
    sensitive_layers_list=['detect.stems.0.conv',
                           'detect.stems.1.conv',
                           'detect.stems.2.conv',
                           'detect.cls_convs.0.conv',
                           'detect.cls_convs.1.conv',
                           'detect.cls_convs.2.conv',
                           'detect.reg_convs.0.conv',
                           'detect.reg_convs.1.conv',
                           'detect.reg_convs.2.conv',
                           'detect.cls_preds.0',
                           'detect.cls_preds.1',
                           'detect.cls_preds.2',
                           'detect.reg_preds.0',
                           'detect.reg_preds.1',
                           'detect.reg_preds.2',
                           ],
)

qat = dict(
    calib_pt = weights/best_ckpt_calib_histogram.pt',
    sensitive_layers_skip = False,
    sensitive_layers_list=['detect.stems.0.conv',
                           'detect.stems.1.conv',
                           'detect.stems.2.conv',
                           'detect.cls_convs.0.conv',
                           'detect.cls_convs.1.conv',
                           'detect.cls_convs.2.conv',
                           'detect.reg_convs.0.conv',
                           'detect.reg_convs.1.conv',
                           'detect.reg_convs.2.conv',
                           'detect.cls_preds.0',
                           'detect.cls_preds.1',
                           'detect.cls_preds.2',
                           'detect.reg_preds.0',
                           'detect.reg_preds.1',
                           'detect.reg_preds.2',
                           ],
)

# Choose Rep-block by the training Mode, choices=["repvgg", "hyper-search", "repopt"]
training_mode='repopt'

Eval Output

Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.574
Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.810
Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.616
Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.275
Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.576
Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.772
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.199
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.588
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.691
Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.452
Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.728
Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.847

The PTQ Output:

W1020 03:41:10.508716 140180999018304 tensor_quantizer.py:173] Disable HistogramCalibrator
W1020 03:41:10.508753 140180999018304 tensor_quantizer.py:173] Disable MaxCalibrator
W1020 03:41:10.508793 140180999018304 tensor_quantizer.py:173] Disable HistogramCalibrator
W1020 03:41:10.508830 140180999018304 tensor_quantizer.py:173] Disable MaxCalibrator
W1020 03:41:10.508869 140180999018304 tensor_quantizer.py:173] Disable HistogramCalibrator
W1020 03:41:10.508906 140180999018304 tensor_quantizer.py:173] Disable MaxCalibrator
W1020 03:41:10.508944 140180999018304 tensor_quantizer.py:173] Disable HistogramCalibrator
W1020 03:41:10.508981 140180999018304 tensor_quantizer.py:173] Disable MaxCalibrator
backbone.stem.conv._input_quantizer     : TensorQuantizer(8bit fake per-tensor amax=dynamic calibrator=HistogramCalibrator scale=1.0 quant)
W1020 03:41:11.779125 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([]).
W1020 03:41:11.779225 140180999018304 tensor_quantizer.py:238] Call .cuda() if running on GPU after loading calibrated amax.
backbone.stem.conv._weight_quantizer    : TensorQuantizer(8bit fake axis=0 amax=dynamic calibrator=MaxCalibrator scale=1.0 quant)
W1020 03:41:11.779388 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([32, 1, 1, 1]).
backbone.ERBlock_2.0.conv._input_quantizer: TensorQuantizer(8bit fake per-tensor amax=dynamic calibrator=HistogramCalibrator scale=1.0 quant)
W1020 03:41:13.871447 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([]).
backbone.ERBlock_2.0.conv._weight_quantizer: TensorQuantizer(8bit fake axis=0 amax=dynamic calibrator=MaxCalibrator scale=1.0 quant)
W1020 03:41:13.871597 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([64, 1, 1, 1]).
backbone.ERBlock_2.1.conv1.conv._input_quantizer: TensorQuantizer(8bit fake per-tensor amax=dynamic calibrator=HistogramCalibrator scale=1.0 quant)
W1020 03:41:16.173452 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([]).
backbone.ERBlock_2.1.conv1.conv._weight_quantizer: TensorQuantizer(8bit fake axis=0 amax=dynamic calibrator=MaxCalibrator scale=1.0 quant)
W1020 03:41:16.173599 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([64, 1, 1, 1]).
backbone.ERBlock_2.1.block.0.conv._input_quantizer: TensorQuantizer(8bit fake per-tensor amax=dynamic calibrator=HistogramCalibrator scale=1.0 quant)
W1020 03:41:18.070713 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([]).
backbone.ERBlock_2.1.block.0.conv._weight_quantizer: TensorQuantizer(8bit fake axis=0 amax=dynamic calibrator=MaxCalibrator scale=1.0 quant)
W1020 03:41:18.070865 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([64, 1, 1, 1]).
backbone.ERBlock_3.0.conv._input_quantizer: TensorQuantizer(8bit fake per-tensor amax=dynamic calibrator=HistogramCalibrator scale=1.0 quant)
W1020 03:41:19.931489 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([]).
backbone.ERBlock_3.0.conv._weight_quantizer: TensorQuantizer(8bit fake axis=0 amax=dynamic calibrator=MaxCalibrator scale=1.0 quant)
W1020 03:41:19.931666 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([128, 1, 1, 1]).
backbone.ERBlock_3.1.conv1.conv._input_quantizer: TensorQuantizer(8bit fake per-tensor amax=dynamic calibrator=HistogramCalibrator scale=1.0 quant)
W1020 03:41:21.874206 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([]).
backbone.ERBlock_3.1.conv1.conv._weight_quantizer: TensorQuantizer(8bit fake axis=0 amax=dynamic calibrator=MaxCalibrator scale=1.0 quant)
W1020 03:41:21.874355 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([128, 1, 1, 1]).
backbone.ERBlock_3.1.block.0.conv._input_quantizer: TensorQuantizer(8bit fake per-tensor amax=dynamic calibrator=HistogramCalibrator scale=1.0 quant)
W1020 03:41:24.835430 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([]).
backbone.ERBlock_3.1.block.0.conv._weight_quantizer: TensorQuantizer(8bit fake axis=0 amax=dynamic calibrator=MaxCalibrator scale=1.0 quant)
W1020 03:41:24.835582 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([128, 1, 1, 1]).
backbone.ERBlock_3.1.block.1.conv._input_quantizer: TensorQuantizer(8bit fake per-tensor amax=dynamic calibrator=HistogramCalibrator scale=1.0 quant)
W1020 03:41:27.491993 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([]).
backbone.ERBlock_3.1.block.1.conv._weight_quantizer: TensorQuantizer(8bit fake axis=0 amax=dynamic calibrator=MaxCalibrator scale=1.0 quant)
W1020 03:41:27.492149 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([128, 1, 1, 1]).
backbone.ERBlock_3.1.block.2.conv._input_quantizer: TensorQuantizer(8bit fake per-tensor amax=dynamic calibrator=HistogramCalibrator scale=1.0 quant)
W1020 03:41:29.415021 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([]).
backbone.ERBlock_3.1.block.2.conv._weight_quantizer: TensorQuantizer(8bit fake axis=0 amax=dynamic calibrator=MaxCalibrator scale=1.0 quant)
W1020 03:41:29.415175 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([128, 1, 1, 1]).
backbone.ERBlock_4.0.conv._input_quantizer: TensorQuantizer(8bit fake per-tensor amax=dynamic calibrator=HistogramCalibrator scale=1.0 quant)
W1020 03:41:32.896461 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([]).
backbone.ERBlock_4.0.conv._weight_quantizer: TensorQuantizer(8bit fake axis=0 amax=dynamic calibrator=MaxCalibrator scale=1.0 quant)
W1020 03:41:32.896619 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([256, 1, 1, 1]).
backbone.ERBlock_4.1.conv1.conv._input_quantizer: TensorQuantizer(8bit fake per-tensor amax=dynamic calibrator=HistogramCalibrator scale=1.0 quant)
W1020 03:41:36.254226 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([]).
backbone.ERBlock_4.1.conv1.conv._weight_quantizer: TensorQuantizer(8bit fake axis=0 amax=dynamic calibrator=MaxCalibrator scale=1.0 quant)
W1020 03:41:36.254375 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([256, 1, 1, 1]).
backbone.ERBlock_4.1.block.0.conv._input_quantizer: TensorQuantizer(8bit fake per-tensor amax=dynamic calibrator=HistogramCalibrator scale=1.0 quant)
W1020 03:41:40.448074 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([]).
backbone.ERBlock_4.1.block.0.conv._weight_quantizer: TensorQuantizer(8bit fake axis=0 amax=dynamic calibrator=MaxCalibrator scale=1.0 quant)
W1020 03:41:40.448277 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([256, 1, 1, 1]).
backbone.ERBlock_4.1.block.1.conv._input_quantizer: TensorQuantizer(8bit fake per-tensor amax=dynamic calibrator=HistogramCalibrator scale=1.0 quant)
W1020 03:41:43.085275 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([]).
backbone.ERBlock_4.1.block.1.conv._weight_quantizer: TensorQuantizer(8bit fake axis=0 amax=dynamic calibrator=MaxCalibrator scale=1.0 quant)
W1020 03:41:43.085426 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([256, 1, 1, 1]).
backbone.ERBlock_4.1.block.2.conv._input_quantizer: TensorQuantizer(8bit fake per-tensor amax=dynamic calibrator=HistogramCalibrator scale=1.0 quant)
W1020 03:41:47.453781 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([]).
backbone.ERBlock_4.1.block.2.conv._weight_quantizer: TensorQuantizer(8bit fake axis=0 amax=dynamic calibrator=MaxCalibrator scale=1.0 quant)
W1020 03:41:47.453934 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([256, 1, 1, 1]).
backbone.ERBlock_4.1.block.3.conv._input_quantizer: TensorQuantizer(8bit fake per-tensor amax=dynamic calibrator=HistogramCalibrator scale=1.0 quant)
W1020 03:41:51.523550 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([]).
backbone.ERBlock_4.1.block.3.conv._weight_quantizer: TensorQuantizer(8bit fake axis=0 amax=dynamic calibrator=MaxCalibrator scale=1.0 quant)
W1020 03:41:51.523705 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([256, 1, 1, 1]).
backbone.ERBlock_4.1.block.4.conv._input_quantizer: TensorQuantizer(8bit fake per-tensor amax=dynamic calibrator=HistogramCalibrator scale=1.0 quant)
W1020 03:41:53.952852 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([]).
backbone.ERBlock_4.1.block.4.conv._weight_quantizer: TensorQuantizer(8bit fake axis=0 amax=dynamic calibrator=MaxCalibrator scale=1.0 quant)
W1020 03:41:53.953005 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([256, 1, 1, 1]).
backbone.ERBlock_5.0.conv._input_quantizer: TensorQuantizer(8bit fake per-tensor amax=dynamic calibrator=HistogramCalibrator scale=1.0 quant)
W1020 03:41:56.132084 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([]).
backbone.ERBlock_5.0.conv._weight_quantizer: TensorQuantizer(8bit fake axis=0 amax=dynamic calibrator=MaxCalibrator scale=1.0 quant)
W1020 03:41:56.132234 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([512, 1, 1, 1]).
backbone.ERBlock_5.1.conv1.conv._input_quantizer: TensorQuantizer(8bit fake per-tensor amax=dynamic calibrator=HistogramCalibrator scale=1.0 quant)
W1020 03:42:01.603554 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([]).
backbone.ERBlock_5.1.conv1.conv._weight_quantizer: TensorQuantizer(8bit fake axis=0 amax=dynamic calibrator=MaxCalibrator scale=1.0 quant)
W1020 03:42:01.603717 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([512, 1, 1, 1]).
backbone.ERBlock_5.1.block.0.conv._input_quantizer: TensorQuantizer(8bit fake per-tensor amax=dynamic calibrator=HistogramCalibrator scale=1.0 quant)
W1020 03:42:05.106818 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([]).
backbone.ERBlock_5.1.block.0.conv._weight_quantizer: TensorQuantizer(8bit fake axis=0 amax=dynamic calibrator=MaxCalibrator scale=1.0 quant)
W1020 03:42:05.106973 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([512, 1, 1, 1]).
backbone.ERBlock_5.2.cv1.conv._input_quantizer: TensorQuantizer(8bit fake per-tensor amax=dynamic calibrator=HistogramCalibrator scale=1.0 quant)
W1020 03:42:08.133834 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([]).
backbone.ERBlock_5.2.cv1.conv._weight_quantizer: TensorQuantizer(8bit fake axis=0 amax=dynamic calibrator=MaxCalibrator scale=1.0 quant)
W1020 03:42:08.133992 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([256, 1, 1, 1]).
backbone.ERBlock_5.2.cv2.conv._input_quantizer: TensorQuantizer(8bit fake per-tensor amax=dynamic calibrator=HistogramCalibrator scale=1.0 quant)
W1020 03:42:11.333766 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([]).
backbone.ERBlock_5.2.cv2.conv._weight_quantizer: TensorQuantizer(8bit fake axis=0 amax=dynamic calibrator=MaxCalibrator scale=1.0 quant)
W1020 03:42:11.333914 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([512, 1, 1, 1]).
backbone.ERBlock_5.2.m._input_quantizer : TensorQuantizer(8bit fake per-tensor amax=dynamic calibrator=HistogramCalibrator scale=1.0 quant)
W1020 03:42:14.537888 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([]).
neck.Rep_p4.conv1.conv._input_quantizer : TensorQuantizer(8bit fake per-tensor amax=dynamic calibrator=HistogramCalibrator scale=1.0 quant)
W1020 03:42:16.472916 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([]).
neck.Rep_p4.conv1.conv._weight_quantizer: TensorQuantizer(8bit fake axis=0 amax=dynamic calibrator=MaxCalibrator scale=1.0 quant)
W1020 03:42:16.473072 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([128, 1, 1, 1]).
neck.Rep_p4.block.0.conv._input_quantizer: TensorQuantizer(8bit fake per-tensor amax=dynamic calibrator=HistogramCalibrator scale=1.0 quant)
W1020 03:42:18.384744 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([]).
neck.Rep_p4.block.0.conv._weight_quantizer: TensorQuantizer(8bit fake axis=0 amax=dynamic calibrator=MaxCalibrator scale=1.0 quant)
W1020 03:42:18.384902 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([128, 1, 1, 1]).
neck.Rep_p4.block.1.conv._input_quantizer: TensorQuantizer(8bit fake per-tensor amax=dynamic calibrator=HistogramCalibrator scale=1.0 quant)
W1020 03:42:21.699568 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([]).
neck.Rep_p4.block.1.conv._weight_quantizer: TensorQuantizer(8bit fake axis=0 amax=dynamic calibrator=MaxCalibrator scale=1.0 quant)
W1020 03:42:21.699729 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([128, 1, 1, 1]).
neck.Rep_p4.block.2.conv._input_quantizer: TensorQuantizer(8bit fake per-tensor amax=dynamic calibrator=HistogramCalibrator scale=1.0 quant)
W1020 03:42:29.520033 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([]).
neck.Rep_p4.block.2.conv._weight_quantizer: TensorQuantizer(8bit fake axis=0 amax=dynamic calibrator=MaxCalibrator scale=1.0 quant)
W1020 03:42:29.520190 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([128, 1, 1, 1]).
neck.Rep_p3.conv1.conv._input_quantizer : TensorQuantizer(8bit fake per-tensor amax=dynamic calibrator=HistogramCalibrator scale=1.0 quant)
W1020 03:42:34.939626 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([]).
neck.Rep_p3.conv1.conv._weight_quantizer: TensorQuantizer(8bit fake axis=0 amax=dynamic calibrator=MaxCalibrator scale=1.0 quant)
W1020 03:42:34.939779 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([64, 1, 1, 1]).
neck.Rep_p3.block.0.conv._input_quantizer: TensorQuantizer(8bit fake per-tensor amax=dynamic calibrator=HistogramCalibrator scale=1.0 quant)
W1020 03:42:38.484270 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([]).
neck.Rep_p3.block.0.conv._weight_quantizer: TensorQuantizer(8bit fake axis=0 amax=dynamic calibrator=MaxCalibrator scale=1.0 quant)
W1020 03:42:38.484425 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([64, 1, 1, 1]).
neck.Rep_p3.block.1.conv._input_quantizer: TensorQuantizer(8bit fake per-tensor amax=dynamic calibrator=HistogramCalibrator scale=1.0 quant)
W1020 03:42:42.628369 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([]).
neck.Rep_p3.block.1.conv._weight_quantizer: TensorQuantizer(8bit fake axis=0 amax=dynamic calibrator=MaxCalibrator scale=1.0 quant)
W1020 03:42:42.628523 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([64, 1, 1, 1]).
neck.Rep_p3.block.2.conv._input_quantizer: TensorQuantizer(8bit fake per-tensor amax=dynamic calibrator=HistogramCalibrator scale=1.0 quant)
W1020 03:42:46.647386 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([]).
neck.Rep_p3.block.2.conv._weight_quantizer: TensorQuantizer(8bit fake axis=0 amax=dynamic calibrator=MaxCalibrator scale=1.0 quant)
W1020 03:42:46.647588 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([64, 1, 1, 1]).
neck.Rep_n3.conv1.conv._input_quantizer : TensorQuantizer(8bit fake per-tensor amax=dynamic calibrator=HistogramCalibrator scale=1.0 quant)
W1020 03:42:50.832626 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([]).
neck.Rep_n3.conv1.conv._weight_quantizer: TensorQuantizer(8bit fake axis=0 amax=dynamic calibrator=MaxCalibrator scale=1.0 quant)
W1020 03:42:50.832780 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([128, 1, 1, 1]).
neck.Rep_n3.block.0.conv._input_quantizer: TensorQuantizer(8bit fake per-tensor amax=dynamic calibrator=HistogramCalibrator scale=1.0 quant)
W1020 03:42:54.653739 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([]).
neck.Rep_n3.block.0.conv._weight_quantizer: TensorQuantizer(8bit fake axis=0 amax=dynamic calibrator=MaxCalibrator scale=1.0 quant)
W1020 03:42:54.653892 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([128, 1, 1, 1]).
neck.Rep_n3.block.1.conv._input_quantizer: TensorQuantizer(8bit fake per-tensor amax=dynamic calibrator=HistogramCalibrator scale=1.0 quant)
W1020 03:42:57.676237 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([]).
neck.Rep_n3.block.1.conv._weight_quantizer: TensorQuantizer(8bit fake axis=0 amax=dynamic calibrator=MaxCalibrator scale=1.0 quant)
W1020 03:42:57.676390 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([128, 1, 1, 1]).
neck.Rep_n3.block.2.conv._input_quantizer: TensorQuantizer(8bit fake per-tensor amax=dynamic calibrator=HistogramCalibrator scale=1.0 quant)
W1020 03:43:01.953913 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([]).
neck.Rep_n3.block.2.conv._weight_quantizer: TensorQuantizer(8bit fake axis=0 amax=dynamic calibrator=MaxCalibrator scale=1.0 quant)
W1020 03:43:01.954066 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([128, 1, 1, 1]).
neck.Rep_n4.conv1.conv._input_quantizer : TensorQuantizer(8bit fake per-tensor amax=dynamic calibrator=HistogramCalibrator scale=1.0 quant)
W1020 03:43:06.167874 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([]).
neck.Rep_n4.conv1.conv._weight_quantizer: TensorQuantizer(8bit fake axis=0 amax=dynamic calibrator=MaxCalibrator scale=1.0 quant)
W1020 03:43:06.168026 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([256, 1, 1, 1]).
neck.Rep_n4.block.0.conv._input_quantizer: TensorQuantizer(8bit fake per-tensor amax=dynamic calibrator=HistogramCalibrator scale=1.0 quant)
W1020 03:43:08.413527 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([]).
neck.Rep_n4.block.0.conv._weight_quantizer: TensorQuantizer(8bit fake axis=0 amax=dynamic calibrator=MaxCalibrator scale=1.0 quant)
W1020 03:43:08.413681 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([256, 1, 1, 1]).
neck.Rep_n4.block.1.conv._input_quantizer: TensorQuantizer(8bit fake per-tensor amax=dynamic calibrator=HistogramCalibrator scale=1.0 quant)
W1020 03:43:10.874183 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([]).
neck.Rep_n4.block.1.conv._weight_quantizer: TensorQuantizer(8bit fake axis=0 amax=dynamic calibrator=MaxCalibrator scale=1.0 quant)
W1020 03:43:10.874336 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([256, 1, 1, 1]).
neck.Rep_n4.block.2.conv._input_quantizer: TensorQuantizer(8bit fake per-tensor amax=dynamic calibrator=HistogramCalibrator scale=1.0 quant)
W1020 03:43:14.220662 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([]).
neck.Rep_n4.block.2.conv._weight_quantizer: TensorQuantizer(8bit fake axis=0 amax=dynamic calibrator=MaxCalibrator scale=1.0 quant)
W1020 03:43:14.220811 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([256, 1, 1, 1]).
neck.reduce_layer0.conv._input_quantizer: TensorQuantizer(8bit fake per-tensor amax=dynamic calibrator=HistogramCalibrator scale=1.0 quant)
W1020 03:43:17.156822 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([]).
neck.reduce_layer0.conv._weight_quantizer: TensorQuantizer(8bit fake axis=0 amax=dynamic calibrator=MaxCalibrator scale=1.0 quant)
W1020 03:43:17.156972 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([128, 1, 1, 1]).
neck.upsample0.upsample_transpose._input_quantizer: TensorQuantizer(8bit fake per-tensor amax=dynamic calibrator=HistogramCalibrator scale=1.0 quant)
W1020 03:43:19.218668 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([]).
neck.upsample0.upsample_transpose._weight_quantizer: TensorQuantizer(8bit fake axis=1 amax=dynamic calibrator=MaxCalibrator scale=1.0 quant)
W1020 03:43:19.218823 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([1, 128, 1, 1]).
neck.reduce_layer1.conv._input_quantizer: TensorQuantizer(8bit fake per-tensor amax=dynamic calibrator=HistogramCalibrator scale=1.0 quant)
W1020 03:43:22.207064 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([]).
neck.reduce_layer1.conv._weight_quantizer: TensorQuantizer(8bit fake axis=0 amax=dynamic calibrator=MaxCalibrator scale=1.0 quant)
W1020 03:43:22.207213 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([64, 1, 1, 1]).
neck.upsample1.upsample_transpose._input_quantizer: TensorQuantizer(8bit fake per-tensor amax=dynamic calibrator=HistogramCalibrator scale=1.0 quant)
W1020 03:43:27.317693 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([]).
neck.upsample1.upsample_transpose._weight_quantizer: TensorQuantizer(8bit fake axis=1 amax=dynamic calibrator=MaxCalibrator scale=1.0 quant)
W1020 03:43:27.317848 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([1, 64, 1, 1]).
neck.downsample2.conv._input_quantizer  : TensorQuantizer(8bit fake per-tensor amax=dynamic calibrator=HistogramCalibrator scale=1.0 quant)
W1020 03:43:32.322772 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([]).
neck.downsample2.conv._weight_quantizer : TensorQuantizer(8bit fake axis=0 amax=dynamic calibrator=MaxCalibrator scale=1.0 quant)
W1020 03:43:32.322957 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([64, 1, 1, 1]).
neck.downsample1.conv._input_quantizer  : TensorQuantizer(8bit fake per-tensor amax=dynamic calibrator=HistogramCalibrator scale=1.0 quant)
W1020 03:43:36.755955 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([]).
neck.downsample1.conv._weight_quantizer : TensorQuantizer(8bit fake axis=0 amax=dynamic calibrator=MaxCalibrator scale=1.0 quant)
W1020 03:43:36.756104 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([128, 1, 1, 1]).
neck.upsample_feat0_quant               : TensorQuantizer(8bit fake per-tensor amax=dynamic calibrator=HistogramCalibrator scale=1.0 quant)
W1020 03:43:38.654680 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([]).
neck.upsample_feat1_quant               : TensorQuantizer(8bit fake per-tensor amax=dynamic calibrator=HistogramCalibrator scale=1.0 quant)
W1020 03:43:44.019705 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([]).
detect.stems.0.conv._input_quantizer    : TensorQuantizer(8bit fake per-tensor amax=dynamic calibrator=HistogramCalibrator scale=1.0 quant)
W1020 03:43:48.980126 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([]).
detect.stems.0.conv._weight_quantizer   : TensorQuantizer(8bit fake axis=0 amax=dynamic calibrator=MaxCalibrator scale=1.0 quant)
W1020 03:43:48.980282 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([64, 1, 1, 1]).
detect.stems.1.conv._input_quantizer    : TensorQuantizer(8bit fake per-tensor amax=dynamic calibrator=HistogramCalibrator scale=1.0 quant)
W1020 03:43:53.389482 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([]).
detect.stems.1.conv._weight_quantizer   : TensorQuantizer(8bit fake axis=0 amax=dynamic calibrator=MaxCalibrator scale=1.0 quant)
W1020 03:43:53.389636 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([128, 1, 1, 1]).
detect.stems.2.conv._input_quantizer    : TensorQuantizer(8bit fake per-tensor amax=dynamic calibrator=HistogramCalibrator scale=1.0 quant)
W1020 03:43:56.624089 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([]).
detect.stems.2.conv._weight_quantizer   : TensorQuantizer(8bit fake axis=0 amax=dynamic calibrator=MaxCalibrator scale=1.0 quant)
W1020 03:43:56.624240 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([256, 1, 1, 1]).
detect.cls_convs.0.conv._input_quantizer: TensorQuantizer(8bit fake per-tensor amax=dynamic calibrator=HistogramCalibrator scale=1.0 quant)
W1020 03:44:01.852701 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([]).
detect.cls_convs.0.conv._weight_quantizer: TensorQuantizer(8bit fake axis=0 amax=dynamic calibrator=MaxCalibrator scale=1.0 quant)
W1020 03:44:01.852849 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([64, 1, 1, 1]).
detect.cls_convs.1.conv._input_quantizer: TensorQuantizer(8bit fake per-tensor amax=dynamic calibrator=HistogramCalibrator scale=1.0 quant)
W1020 03:44:05.816592 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([]).
detect.cls_convs.1.conv._weight_quantizer: TensorQuantizer(8bit fake axis=0 amax=dynamic calibrator=MaxCalibrator scale=1.0 quant)
W1020 03:44:05.816745 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([128, 1, 1, 1]).
detect.cls_convs.2.conv._input_quantizer: TensorQuantizer(8bit fake per-tensor amax=dynamic calibrator=HistogramCalibrator scale=1.0 quant)
W1020 03:44:09.290442 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([]).
detect.cls_convs.2.conv._weight_quantizer: TensorQuantizer(8bit fake axis=0 amax=dynamic calibrator=MaxCalibrator scale=1.0 quant)
W1020 03:44:09.290595 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([256, 1, 1, 1]).
detect.reg_convs.0.conv._input_quantizer: TensorQuantizer(8bit fake per-tensor amax=dynamic calibrator=HistogramCalibrator scale=1.0 quant)
W1020 03:44:14.529686 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([]).
detect.reg_convs.0.conv._weight_quantizer: TensorQuantizer(8bit fake axis=0 amax=dynamic calibrator=MaxCalibrator scale=1.0 quant)
W1020 03:44:14.529838 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([64, 1, 1, 1]).
detect.reg_convs.1.conv._input_quantizer: TensorQuantizer(8bit fake per-tensor amax=dynamic calibrator=HistogramCalibrator scale=1.0 quant)
W1020 03:44:18.493697 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([]).
detect.reg_convs.1.conv._weight_quantizer: TensorQuantizer(8bit fake axis=0 amax=dynamic calibrator=MaxCalibrator scale=1.0 quant)
W1020 03:44:18.493847 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([128, 1, 1, 1]).
detect.reg_convs.2.conv._input_quantizer: TensorQuantizer(8bit fake per-tensor amax=dynamic calibrator=HistogramCalibrator scale=1.0 quant)
W1020 03:44:21.991755 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([]).
detect.reg_convs.2.conv._weight_quantizer: TensorQuantizer(8bit fake axis=0 amax=dynamic calibrator=MaxCalibrator scale=1.0 quant)
W1020 03:44:21.991906 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([256, 1, 1, 1]).
detect.cls_preds.0._input_quantizer     : TensorQuantizer(8bit fake per-tensor amax=dynamic calibrator=HistogramCalibrator scale=1.0 quant)
W1020 03:44:26.881401 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([]).
detect.cls_preds.0._weight_quantizer    : TensorQuantizer(8bit fake axis=0 amax=dynamic calibrator=MaxCalibrator scale=1.0 quant)
W1020 03:44:26.881546 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([]).
detect.cls_preds.1._input_quantizer     : TensorQuantizer(8bit fake per-tensor amax=dynamic calibrator=HistogramCalibrator scale=1.0 quant)
W1020 03:44:30.806756 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([]).
detect.cls_preds.1._weight_quantizer    : TensorQuantizer(8bit fake axis=0 amax=dynamic calibrator=MaxCalibrator scale=1.0 quant)
W1020 03:44:30.806903 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([]).
detect.cls_preds.2._input_quantizer     : TensorQuantizer(8bit fake per-tensor amax=dynamic calibrator=HistogramCalibrator scale=1.0 quant)
W1020 03:44:34.323078 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([]).
detect.cls_preds.2._weight_quantizer    : TensorQuantizer(8bit fake axis=0 amax=dynamic calibrator=MaxCalibrator scale=1.0 quant)
W1020 03:44:34.323236 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([]).
detect.reg_preds.0._input_quantizer     : TensorQuantizer(8bit fake per-tensor amax=dynamic calibrator=HistogramCalibrator scale=1.0 quant)
W1020 03:44:36.417203 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([]).
detect.reg_preds.0._weight_quantizer    : TensorQuantizer(8bit fake axis=0 amax=dynamic calibrator=MaxCalibrator scale=1.0 quant)
W1020 03:44:36.417351 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([4, 1, 1, 1]).
detect.reg_preds.1._input_quantizer     : TensorQuantizer(8bit fake per-tensor amax=dynamic calibrator=HistogramCalibrator scale=1.0 quant)
W1020 03:44:40.287573 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([]).
detect.reg_preds.1._weight_quantizer    : TensorQuantizer(8bit fake axis=0 amax=dynamic calibrator=MaxCalibrator scale=1.0 quant)
W1020 03:44:40.287724 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([4, 1, 1, 1]).
detect.reg_preds.2._input_quantizer     : TensorQuantizer(8bit fake per-tensor amax=dynamic calibrator=HistogramCalibrator scale=1.0 quant)
W1020 03:44:44.057823 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([]).
detect.reg_preds.2._weight_quantizer    : TensorQuantizer(8bit fake axis=0 amax=dynamic calibrator=MaxCalibrator scale=1.0 quant)
W1020 03:44:44.057976 140180999018304 tensor_quantizer.py:237] Load calibrated amax, shape=torch.Size([4, 1, 1, 1]).

Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.001
Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.001
Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.001
Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.002
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.011
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.013
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.013
Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.036

Additional

No response

haritsahm commented 2 years ago

I have tested it with partial quantization and another issue also occured.

Partial Quantization Export

Accumulating evaluation results...
DONE (t=1.51s).
Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.574
Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.810
Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.616
Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.275
Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.576
Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.772
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.199
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.587
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.691
Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.452
Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.729
Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.847
Skip Layer detect.proj_conv
op amax =  5.1035, amax = -1.0000
op amax =  5.1425, amax =  5.1035
amax =  5.1425
op amax =  6.0546, amax = -1.0000
op amax =  3.9350, amax =  6.0546
Not quantable op, skip
op amax =  3.9134, amax = -1.0000
op amax =  3.9041, amax =  3.9134
Not quantable op, skip
op amax = 11.4002, amax = -1.0000
op amax = 12.6850, amax = 11.4002
amax = 12.6850
op amax =  8.2758, amax = -1.0000
op amax =  9.5599, amax =  8.2758
amax =  9.5599
op amax =  3.9648, amax = -1.0000
op amax =  3.9648, amax =  3.9648
amax =  3.9648
op amax =  4.4959, amax = -1.0000
op amax =  4.4959, amax =  4.4959
amax =  4.4959
op amax =  3.9817, amax = -1.0000
op amax =  3.9817, amax =  3.9817
amax =  3.9817
Inferencing model in val datasets.: 100%
loading annotations into memory...
Done (t=0.03s)
creating index...
index created!
Loading and preparing results...
DONE (t=1.13s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=14.29s).
Accumulating evaluation results...
DONE (t=1.59s).
Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.574
Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.810
Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.617
Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.275
Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.577
Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.773
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.199
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.588
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.692
Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.455
Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.730
Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.847
(0.8100440948596181, 0.5744896506105379)

The exported model will cause an error in TensorRT

[10/20/2022-15:01:22] [V] [TRT] QuantizeLinear_25 [QuantizeLinear] inputs: [backbone.ERBlock_2.0.conv_1x1.weight -> (64, 32, 1, 1)[FLOAT]], [611 -> (64)[FLOAT]], [2028 -> (64)[INT8]], 
[10/20/2022-15:01:22] [V] [TRT] Registering layer: backbone.ERBlock_2.0.conv_1x1.weight for ONNX node: backbone.ERBlock_2.0.conv_1x1.weight
[10/20/2022-15:01:22] [E] [TRT] parsers/onnx/ModelImporter.cpp:791: While parsing node number 25 [QuantizeLinear -> "614"]:
[10/20/2022-15:01:22] [E] [TRT] parsers/onnx/ModelImporter.cpp:792: --- Begin node ---
[10/20/2022-15:01:22] [E] [TRT] parsers/onnx/ModelImporter.cpp:793: input: "backbone.ERBlock_2.0.conv_1x1.weight"
input: "611"
input: "2028"
output: "614"
name: "QuantizeLinear_25"
op_type: "QuantizeLinear"
attribute {
  name: "axis"
  i: 0
  type: INT
}

[10/20/2022-15:01:22] [E] [TRT] parsers/onnx/ModelImporter.cpp:794: --- End node ---
[10/20/2022-15:01:22] [E] [TRT] parsers/onnx/ModelImporter.cpp:796: ERROR: parsers/onnx/builtin_op_importers.cpp:1150 In function QuantDequantLinearHelper:
[6] Assertion failed: scaleAllPositive && "Scale coefficients must all be positive"
[10/20/2022-15:01:22] [E] Failed to parse onnx file

TensorRT Commands

trtexec --onnx=best_ckpt_partial_dynamic.onnx --saveEngine=best_ckpt_partial_dynamic.engine --minShapes=images:1x3x640x640 --optShapes=images:4x3x640x640 --maxShapes=images:8x3x640x640 --fp16 --int8 --warmUp=1000 --avgRuns=1000 --workspace=2048 --inputIOFormats=fp16:chw --verbose
lippman1125 commented 2 years ago

@haritsahm if all elements of one channel is zero, so the channel amax is zero and channel scale is zero, the model can be also exported to onnx, but TensorRT build will be error. You can manually modify amax to a small number, such as 1e-6 to work around.

haritsahm commented 2 years ago

@haritsahm if all elements of one channel is zero, so the channel amax is zero and channel scale is zero, the model can be also exported to onnx, but TensorRT build will be error. You can manually modify amax to a small number, such as 1e-6 to work around.

How do I do this? The log only shows -1 on amax value. Do you have any updates on the PTQ from my first post?

lippman1125 commented 2 years ago

@haritsahm Please refer to https://github.com/meituan/YOLOv6/blob/main/tools/qat/qat_export.py, it has an option "--scale-fix", it will fix zero scale。

haritsahm commented 2 years ago

@haritsahm Please refer to https://github.com/meituan/YOLOv6/blob/main/tools/qat/qat_export.py, it has an option "--scale-fix", it will fix zero scale。

@lippman1125 This helps when fixing the scaling issue when using partial quantization method. I apply the scale fix function in partial quantization but the inference speed is very low. I'm still waiting for the solution related to the quantization process.

I was able to train the model using quantization and distillation process but without the calib weight from PTQ process. But this means that I have to retrain the model from scratch again using QAT method.

haritsahm commented 2 years ago

Quick update,

I tried to retrain the model and perform PTQ using the same dataset but use nc=2 to create a fake class. The PTQ and QAT outputs are both normal

Normal training 21 epochs

Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.107
Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.224
Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.089
Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.080
Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.207
Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.103
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.049
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.196
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.390
Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.247
Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.501
Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.400

PTQ Output

Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.100
Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.210
Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.084
Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.081
Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.193
Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.105
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.046
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.194
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.389
Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.254
Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.502
Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.392

QAT Output

Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.110
Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.227
Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.093
Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.088
Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.209
Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.114
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.049
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.208
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.410
Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.257
Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.506
Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.442

Does it has something todo with yolov6 quantization pipeline or it is related to pytorch_quantization library? @lippman1125