Open hamedgorji opened 5 months ago
Update: I changed my config to:
# YOLOv6n-seg model
model = dict(
type='YOLOv6n',
pretrained='D:/YOLOv6-seg/assets/pretrained_opt.pt',
scales='D:/YOLOv6-seg/assets/scale.pt',
depth_multiple=0.33,
width_multiple=0.25,
backbone=dict(
type='EfficientRep',
num_repeats=[1, 6, 12, 18, 6],
out_channels=[64, 128, 256, 512, 1024],
fuse_P2=True,
cspsppf=True,
),
neck=dict(
type='RepBiFPANNeck',
num_repeats=[12, 12, 12, 12],
out_channels=[256, 128, 128, 256, 256, 512],
),
head=dict(
type='EffiDeHead',
in_channels=[128, 256, 512],
num_layers=3,
begin_indices=24,
npr=256,
nm=32,
isseg=True,
issolo=False,
anchors=3,
anchors_init=[[10,13, 19,19, 33,23],
[30,61, 59,59, 59,119],
[116,90, 185,185, 373,326]],
out_indices=[17, 20, 23],
strides=[8, 16, 32],
atss_warmup_epoch=0,
iou_type='siou',
use_dfl=False, # set to True if you want to further train with distillation
reg_max=0, # set to 16 if you want to further train with distillation
distill_weight={
'class': 1.0,
'dfl': 1.0,
},
)
)
solver = dict(
optim='SGD',
lr_scheduler='Cosine',
lr0=0.02,
lrf=0.01,
momentum=0.937,
weight_decay=0.001,
warmup_epochs=3.0,
warmup_momentum=0.8,
warmup_bias_lr=0.1
)
data_aug = dict(
hsv_h=0.015,
hsv_s=0.7,
hsv_v=0.4,
degrees=0.0,
translate=0.1,
scale=0.5,
shear=0.0,
flipud=0.0,
fliplr=0.5,
mosaic=1.0,
mixup=0.0,
)
ptq = dict(
num_bits = 8,
calib_batches = 4,
# 'max', 'histogram'
calib_method = 'max',
# 'entropy', 'percentile', 'mse'
histogram_amax_method='entropy',
histogram_amax_percentile=99.99,
calib_output_path='./',
sensitive_layers_skip=False,
sensitive_layers_list=[],
)
qat = dict(
calib_pt = './assets/v6n_calib_max.pt',
sensitive_layers_skip = False,
sensitive_layers_list=[],
)
# Choose Rep-block by the training Mode, choices=["repvgg", "hyper-search", "repopt"]
training_mode='repopt'
I then ran the following command:
python tools/train.py --data data/data.yaml --output-dir ./runs/train_im256_30636_qat --conf configs/yolov6n_seg_opt_qat.py --quant --distill --distill_feat --batch 32 --epochs 10 --workers 32 --teacher_model_path "D:/YOLOv6-seg/assets/pretrained_opt.pt" --device 0
But it loaded the model first and at the end gave me this error:
Skip Layer detect.proj_conv
Traceback (most recent call last):
File "D:\YOLOv6-seg\tools\train.py", line 142, in <module>
main(args)
File "D:\YOLOv6-seg\tools\train.py", line 127, in main
trainer = Trainer(args, cfg, device)
File "D:\YOLOv6-seg\yolov6\core\engine.py", line 68, in __init__
self.quant_setup(model, cfg, device)
File "D:\YOLOv6-seg\yolov6\core\engine.py", line 602, in quant_setup
model.neck.upsample_enable_quant(cfg.ptq.num_bits, cfg.ptq.calib_method)
File "C:\Users\Hamed\miniconda3\envs\yolov6\lib\site-packages\torch\nn\modules\module.py", line 1614, in __getattr__
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'RepBiFPANNeck' object has no attribute 'upsample_enable_quant'
I got this error for both PTQ and QAT.
Update2: I fixed the above error by adding the following function to RepBiFPANNeck
class
def upsample_enable_quant(self, num_bits, calib_method):
print("Insert fakequant after upsample")
from pytorch_quantization import nn as quant_nn
from pytorch_quantization.tensor_quant import QuantDescriptor
conv2d_input_default_desc = QuantDescriptor(num_bits=num_bits, calib_method=calib_method)
self.upsample_feat0_quant = quant_nn.TensorQuantizer(conv2d_input_default_desc)
self.upsample_feat1_quant = quant_nn.TensorQuantizer(conv2d_input_default_desc)
self._QUANT = True
But now I get another error regarding calib max when I try to do PTQ
python tools/train.py --data data/data.yaml --output-dir ./runs/train_im256_30636_ptq --conf configs/yolov6n_seg_opt_qat.py --quant --calib --batch 16 --workers 0 --device 0
Traceback (most recent call last):
File "D:\YOLOv6-seg\tools\train.py", line 142, in <module>
main(args)
File "D:\YOLOv6-seg\tools\train.py", line 130, in main
trainer.calibrate(cfg)
File "D:\YOLOv6-seg\yolov6\core\engine.py", line 592, in calibrate
ptq_calibrate(self.model, self.train_loader, cfg)
File "D:\YOLOv6-seg\tools\qat\qat_utils.py", line 61, in ptq_calibrate
compute_amax(model, method=cfg.ptq.histogram_amax_method, percentile=cfg.ptq.histogram_amax_percentile)
File "D:\YOLOv6-seg\tools\qat\qat_utils.py", line 47, in compute_amax
module.load_calib_amax()
File "C:\Users\Hamed\miniconda3\envs\yolov6\lib\site-packages\pytorch_quantization\nn\modules\tensor_quantizer.py", line 237, in load_calib_amax
raise RuntimeError(err_msg + " Passing 'strict=False' to `load_calib_amax()` will ignore the error.")
RuntimeError: Calibrator returned None. This usually happens when calibrator hasn't seen any tensor. Passing 'strict=False' to `load_calib_amax()` will ignore the error.
@Chilicyy Any thoughts on this?
Update 3: As I mentioned above during the PTQ process, I encountered a new error related to calibration maximum (calib max). Specifically, the error message indicated that the calibrator returned None, suggesting it hasn't seen any tensor during calibration.
To diagnose this, I added detailed logging and discovered that the neck.upsample_feat0_quant
and neck.upsample_feat1_quant
layers were encountering issues:
Error for neck.upsample_feat0_quant: Calibrator returned None. This usually happens when calibrator hasn't seen any tensor. Passing 'strict=False' to `load_calib_amax()` will ignore the error.
Loaded calib_amax for neck.upsample_feat0_quant
Error for neck.upsample_feat1_quant: Calibrator returned None. This usually happens when calibrator hasn't seen any tensor. Passing 'strict=False' to `load_calib_amax()` will ignore the error.
Loaded calib_amax for neck.upsample_feat1_quant
It seems that during the calibration phase, these layers are not receiving the expected data, leading to the calibrator returning None.
Maybe the issue is because of adding the following function to the RepBiFPANNeck
class without modifying the forward pass.
def upsample_enable_quant(self, num_bits, calib_method):
print("Insert fakequant after upsample")
# Insert fakequant after upsample op to build TensorRT engine
from pytorch_quantization import nn as quant_nn
from pytorch_quantization.tensor_quant import QuantDescriptor
conv2d_input_default_desc = QuantDescriptor(num_bits=num_bits, calib_method=calib_method)
self.upsample_feat0_quant = quant_nn.TensorQuantizer(conv2d_input_default_desc)
self.upsample_feat1_quant = quant_nn.TensorQuantizer(conv2d_input_default_desc)
# global _QUANT
self._QUANT = True
Any suggestions?
Update 4: I've made some changes to the RepBiFPANNeck
forward function similar to the RepPANNeck
, and the problem has been solved.
def forward(self, input):
(x3, x2, x1, x0) = input
fpn_out0 = self.reduce_layer0(x0)
f_concat_layer0 = self.Bifusion0([fpn_out0, x1, x2])
if hasattr(self, '_QUANT') and self._QUANT is True:
f_concat_layer0 = self.upsample_feat0_quant(f_concat_layer0)
f_out0 = self.Rep_p4(f_concat_layer0)
fpn_out1 = self.reduce_layer1(f_out0)
f_concat_layer1 = self.Bifusion1([fpn_out1, x2, x3])
if hasattr(self, '_QUANT') and self._QUANT is True:
f_concat_layer1 = self.upsample_feat1_quant(f_concat_layer1)
pan_out2 = self.Rep_p3(f_concat_layer1)
down_feat1 = self.downsample2(pan_out2)
p_concat_layer1 = torch.cat([down_feat1, fpn_out1], 1)
pan_out1 = self.Rep_n3(p_concat_layer1)
down_feat0 = self.downsample1(pan_out1)
p_concat_layer2 = torch.cat([down_feat0, fpn_out0], 1)
pan_out0 = self.Rep_n4(p_concat_layer2)
outputs = [pan_out2, pan_out1, pan_out0]
return outputs
@zhiyelee @cfc4n @yeldarby @rainsun
I was finally able to train my model using the QAT approach. However, when I converted it to ONNX using qat_export.py
, the model failed to perform any segmentation. This segmentation QAT approach has been quite problematic, and I'm puzzled as to why the authors included this section when it is not fully tested. I spent about three weeks troubleshooting various issues but still couldn't get it to work.
If it had been mentioned that the segmentation model does not support QAT, I could have explored other options instead of losing three weeks of my time.
Search before asking
Description
Hi YOLOv6 Team,
I am currently working on a project that requires Quantization-Aware Training (QAT) for segmentation tasks using YOLOv6. I noticed that configurations like
yolov6n_hs
,yolov6n_opt
, andyolov6n_opt_qat
are available for detection but not for segmentation.To achieve QAT for segmentation, should I add the following configurations at the end of my config file:
Could you please guide me on the correct approach to enable QAT for segmentation tasks? Are there any example configurations or guidelines available for integrating QAT with segmentation in YOLOv6?
Thank you.
Use case
No response
Additional
No response
Are you willing to submit a PR?