meituan / YOLOv6

YOLOv6: a single-stage object detection framework dedicated to industrial applications.
GNU General Public License v3.0
5.71k stars 1.03k forks source link

训练yolov6n-opt-qat出现的问题 #586

Closed LuoPeng-CV closed 1 year ago

LuoPeng-CV commented 2 years ago

Before Asking

Search before asking

Question

您好,在进行qat训练时出现以下报错:

Skip Layer detect.proj_conv Insert fakequant after upsample Traceback (most recent call last): File "tools/train.py", line 126, in <module> main(args) File "tools/train.py", line 111, in main trainer = Trainer(args, cfg, device) File "/home/novasky/lp/YOLOv6-main/yolov6/core/engine.py", line 56, in __init__ self.quant_setup(model, cfg, device) File "/home/novasky/lp/YOLOv6-main/yolov6/core/engine.py", line 537, in quant_setup model.load_state_dict(torch.load(cfg.qat.calib_pt)['model'].float().state_dict()) File "/home/novasky/anaconda3/envs/novasky/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1497, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for Model: Missing key(s) in state_dict: "backbone.ERBlock_5.2.cv1.bn.weight", "backbone.ERBlock_5.2.cv1.bn.bias", "backbone.ERBlock_5.2.cv1.bn.running_mean", "backbone.ERBlock_5.2.cv1.bn.running_var", "backbone.ERBlock_5.2.cv2.bn.weight", "backbone.ERBlock_5.2.cv2.bn.bias", "backbone.ERBlock_5.2.cv2.bn.running_mean", "backbone.ERBlock_5.2.cv2.bn.running_var", "neck.reduce_layer0.bn.weight", "neck.reduce_layer0.bn.bias", "neck.reduce_layer0.bn.running_mean", "neck.reduce_layer0.bn.running_var", "neck.reduce_layer1.bn.weight", "neck.reduce_layer1.bn.bias", "neck.reduce_layer1.bn.running_mean", "neck.reduce_layer1.bn.running_var", "neck.downsample2.bn.weight", "neck.downsample2.bn.bias", "neck.downsample2.bn.running_mean", "neck.downsample2.bn.running_var", "neck.downsample1.bn.weight", "neck.downsample1.bn.bias", "neck.downsample1.bn.running_mean", "neck.downsample1.bn.running_var", "detect.stems.0.bn.weight", "detect.stems.0.bn.bias", "detect.stems.0.bn.running_mean", "detect.stems.0.bn.running_var", "detect.stems.1.bn.weight", "detect.stems.1.bn.bias", "detect.stems.1.bn.running_mean", "detect.stems.1.bn.running_var", "detect.stems.2.bn.weight", "detect.stems.2.bn.bias", "detect.stems.2.bn.running_mean", "detect.stems.2.bn.running_var", "detect.cls_convs.0.bn.weight", "detect.cls_convs.0.bn.bias", "detect.cls_convs.0.bn.running_mean", "detect.cls_convs.0.bn.running_var", "detect.cls_convs.1.bn.weight", "detect.cls_convs.1.bn.bias", "detect.cls_convs.1.bn.running_mean", "detect.cls_convs.1.bn.running_var", "detect.cls_convs.2.bn.weight", "detect.cls_convs.2.bn.bias", "detect.cls_convs.2.bn.running_mean", "detect.cls_convs.2.bn.running_var", "detect.reg_convs.0.bn.weight", "detect.reg_convs.0.bn.bias", "detect.reg_convs.0.bn.running_mean", "detect.reg_convs.0.bn.running_var", "detect.reg_convs.1.bn.weight", "detect.reg_convs.1.bn.bias", "detect.reg_convs.1.bn.running_mean", "detect.reg_convs.1.bn.running_var", "detect.reg_convs.2.bn.weight", "detect.reg_convs.2.bn.bias", "detect.reg_convs.2.bn.running_mean", "detect.reg_convs.2.bn.running_var". Unexpected key(s) in state_dict: "backbone.ERBlock_5.2.cv1.conv.bias", "backbone.ERBlock_5.2.cv2.conv.bias", "neck.reduce_layer0.conv.bias", "neck.reduce_layer1.conv.bias", "neck.downsample2.conv.bias", "neck.downsample1.conv.bias", "detect.stems.0.conv.bias", "detect.stems.1.conv.bias", "detect.stems.2.conv.bias", "detect.cls_convs.0.conv.bias", "detect.cls_convs.1.conv.bias", "detect.cls_convs.2.conv.bias", "detect.reg_convs.0.conv.bias", "detect.reg_convs.1.conv.bias", "detect.reg_convs.2.conv.bias". Skip Layer detect.proj_conv

训练指令为:

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 tools/train.py --data sp500rvf.yaml --output-dir ./runs/opt_train_v6n_qat --conf configs/repopt/yolov6n_opt_qat.py --quant --distill --distill_feat --batch 128 --epochs 10 --workers 8 --teacher_model_path runs/train/yolov6s_repopt/weights/best_ckpt.pt --device 0,1 --name v6n_kd_qat

其中,yolov6n_opt_qat中的calib_pt是由sensitivity_analyse.py生成的:

python3 sensitivity_analyse.py --weights ../../runs/train/yolov6n_repopt/weights/best_ckpt.pt --batch-size 32 --batch-number 4 --data-root ~/Dataset/v6dataset/ --img-size 640 --data-yaml ../../sp500rvf.yaml --eval-yaml eval.yaml

能给我一些建议吗,谢谢。

Additional

No response

ghost commented 2 years ago

I suggest that you can try the solution in " #572 " I will be glad if you give feedback after trying it.

xingyueye commented 2 years ago

你的网络和checkpoint不对应,一个是有BN的一个没BN。查一下哪里融合了吧

Missing key(s) in state_dict: "backbone.ERBlock_5.2.cv1.bn.weight", "backbone.ERBlock_5.2.cv1.bn.bias", "backbone.ERBlock_5.2.cv1.bn.running_mean", "backbone.ERBlock_5.2.cv1.bn.running_var",
Unexpected key(s) in state_dict: "backbone.ERBlock_5.2.cv1.conv.bias",
LuoPeng-CV commented 1 year ago

不要用sensitivity_analyse.py生成calib_pt,训练时使用ptq方式生成即可

barathsku commented 1 year ago

@LuoPeng-CV How to use ptq.py to build? sensitivity_analyse.py calls the function in ptq.py

LuoPeng-CV commented 1 year ago

@barathsku I'm sorry I was wrong, you can get the calib pt by adding '--quant' and '--calib' when training, by this way you won't get the mistake above.