Closed haritsahm closed 1 year ago
I have tested it with partial quantization and another issue also occured.
Accumulating evaluation results...
DONE (t=1.51s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.574
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.810
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.616
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.275
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.576
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.772
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.199
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.587
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.691
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.452
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.729
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.847
Skip Layer detect.proj_conv
op amax = 5.1035, amax = -1.0000
op amax = 5.1425, amax = 5.1035
amax = 5.1425
op amax = 6.0546, amax = -1.0000
op amax = 3.9350, amax = 6.0546
Not quantable op, skip
op amax = 3.9134, amax = -1.0000
op amax = 3.9041, amax = 3.9134
Not quantable op, skip
op amax = 11.4002, amax = -1.0000
op amax = 12.6850, amax = 11.4002
amax = 12.6850
op amax = 8.2758, amax = -1.0000
op amax = 9.5599, amax = 8.2758
amax = 9.5599
op amax = 3.9648, amax = -1.0000
op amax = 3.9648, amax = 3.9648
amax = 3.9648
op amax = 4.4959, amax = -1.0000
op amax = 4.4959, amax = 4.4959
amax = 4.4959
op amax = 3.9817, amax = -1.0000
op amax = 3.9817, amax = 3.9817
amax = 3.9817
Inferencing model in val datasets.: 100%
loading annotations into memory...
Done (t=0.03s)
creating index...
index created!
Loading and preparing results...
DONE (t=1.13s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=14.29s).
Accumulating evaluation results...
DONE (t=1.59s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.574
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.810
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.617
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.275
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.577
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.773
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.199
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.588
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.692
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.455
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.730
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.847
(0.8100440948596181, 0.5744896506105379)
The exported model will cause an error in TensorRT
[10/20/2022-15:01:22] [V] [TRT] QuantizeLinear_25 [QuantizeLinear] inputs: [backbone.ERBlock_2.0.conv_1x1.weight -> (64, 32, 1, 1)[FLOAT]], [611 -> (64)[FLOAT]], [2028 -> (64)[INT8]],
[10/20/2022-15:01:22] [V] [TRT] Registering layer: backbone.ERBlock_2.0.conv_1x1.weight for ONNX node: backbone.ERBlock_2.0.conv_1x1.weight
[10/20/2022-15:01:22] [E] [TRT] parsers/onnx/ModelImporter.cpp:791: While parsing node number 25 [QuantizeLinear -> "614"]:
[10/20/2022-15:01:22] [E] [TRT] parsers/onnx/ModelImporter.cpp:792: --- Begin node ---
[10/20/2022-15:01:22] [E] [TRT] parsers/onnx/ModelImporter.cpp:793: input: "backbone.ERBlock_2.0.conv_1x1.weight"
input: "611"
input: "2028"
output: "614"
name: "QuantizeLinear_25"
op_type: "QuantizeLinear"
attribute {
name: "axis"
i: 0
type: INT
}
[10/20/2022-15:01:22] [E] [TRT] parsers/onnx/ModelImporter.cpp:794: --- End node ---
[10/20/2022-15:01:22] [E] [TRT] parsers/onnx/ModelImporter.cpp:796: ERROR: parsers/onnx/builtin_op_importers.cpp:1150 In function QuantDequantLinearHelper:
[6] Assertion failed: scaleAllPositive && "Scale coefficients must all be positive"
[10/20/2022-15:01:22] [E] Failed to parse onnx file
trtexec --onnx=best_ckpt_partial_dynamic.onnx --saveEngine=best_ckpt_partial_dynamic.engine --minShapes=images:1x3x640x640 --optShapes=images:4x3x640x640 --maxShapes=images:8x3x640x640 --fp16 --int8 --warmUp=1000 --avgRuns=1000 --workspace=2048 --inputIOFormats=fp16:chw --verbose
@haritsahm if all elements of one channel is zero, so the channel amax is zero and channel scale is zero, the model can be also exported to onnx, but TensorRT build will be error. You can manually modify amax to a small number, such as 1e-6 to work around.
@haritsahm if all elements of one channel is zero, so the channel amax is zero and channel scale is zero, the model can be also exported to onnx, but TensorRT build will be error. You can manually modify amax to a small number, such as 1e-6 to work around.
How do I do this? The log only shows -1 on amax value. Do you have any updates on the PTQ from my first post?
@haritsahm Please refer to https://github.com/meituan/YOLOv6/blob/main/tools/qat/qat_export.py, it has an option "--scale-fix", it will fix zero scale。
@haritsahm Please refer to https://github.com/meituan/YOLOv6/blob/main/tools/qat/qat_export.py, it has an option "--scale-fix", it will fix zero scale。
@lippman1125 This helps when fixing the scaling issue when using partial quantization method. I apply the scale fix function in partial quantization but the inference speed is very low. I'm still waiting for the solution related to the quantization process.
I was able to train the model using quantization and distillation process but without the calib weight from PTQ process. But this means that I have to retrain the model from scratch again using QAT method.
Quick update,
I tried to retrain the model and perform PTQ using the same dataset but use nc=2
to create a fake class. The PTQ and QAT outputs are both normal
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.107
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.224
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.089
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.080
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.207
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.103
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.049
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.196
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.390
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.247
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.501
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.400
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.100
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.210
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.084
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.081
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.193
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.105
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.046
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.194
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.389
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.254
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.502
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.392
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.110
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.227
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.093
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.088
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.209
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.114
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.049
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.208
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.410
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.257
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.506
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.442
Does it has something todo with yolov6 quantization pipeline or it is related to pytorch_quantization library? @lippman1125
Before Asking
[X] I have read the README carefully. 我已经仔细阅读了README上的操作指引。
[X] I want to train my custom dataset, and I have read the tutorials for training your custom data carefully and organize my dataset correctly; (FYI: We recommand you to apply the config files of xx_finetune.py.) 我想训练自定义数据集,我已经仔细阅读了训练自定义数据的教程,以及按照正确的目录结构存放数据集。(FYI: 我们推荐使用xx_finetune.py等配置文件训练自定义数据集。)
[X] I have pulled the latest code of main branch to run again and the problem still existed. 我已经拉取了主分支上最新的代码,重新运行之后,问题仍不能解决。
Search before asking
Question
After training the model following the guide in tutorial_repopt, I got an unusual behaviour when running the PTQ and QAT training steps, similar to what I reported in 535#issuecomment-1284808881.
Notes:
Final numbers of valid images: 64094/ labels: 64094.
Final numbers of valid images: 2693/ labels: 2693.
preds, s_featmaps = self.model(images)
are tensor ofnans
Commands
Configs
Eval Output
The PTQ Output:
Additional
No response