meituan / YOLOv6

YOLOv6: a single-stage object detection framework dedicated to industrial applications.
GNU General Public License v3.0
5.69k stars 1.03k forks source link

请问要使用自蒸馏teacher model path要设置一个什么样的预训练权重, #688

Open Lightmannnn opened 1 year ago

Lightmannnn commented 1 year ago

Before Asking

Search before asking

Question

我最开始没有开启自蒸馏训练,跑了200个epoch,得到了一个效果还不错的权重,现在想要开启自蒸馏,我要使用已经在这个数据集上效果非常好的权重文件,还是GitHub上给的预训练权重文件。

Additional

No response

Chilicyy commented 1 year ago

config文件中预训练权重路径指定 github上提供的COCO预训练权重,启动自蒸馏训练的命令行中添加--distill --teacher_model_path xxx(这里指定您已训好的模型权重路径)。具体可参考这里https://github.com/meituan/YOLOv6/blob/main/docs/Train_coco_data.md

Lightmannnn commented 1 year ago

config文件中预训练权重路径指定 github上提供的COCO预训练权重,启动自蒸馏训练的命令行中添加--distill --teacher_model_path xxx(这里指定您已训好的模型权重路径)。具体可参考这里https://github.com/meituan/YOLOv6/blob/main/docs/Train_coco_data.md

网站打不开了,我叙述一下我理解的意思,您看看对不对,就是我如果要使用yolov6s_finetune.py,我就去下载coco的预训练权重,我如果要使用自蒸馏,我就要用我训好的模型权重(200轮epoch的)作为教师模型的权重

Chilicyy commented 1 year ago

对的,现在应该可以打开了。预训练权重路径在config文件中指定,蒸馏采用的教师模型权重路径在训练启动命令中指定。

Lightmannnn commented 1 year ago

好的,我现在就是这么跑的,现在已经跑了100多个epoch了,但是ap和map还没有不开启自蒸馏的时候高,这种情况正常么?

Chilicyy commented 1 year ago

ap和map还没有不开启自蒸馏的时候高 对比的是相同epoch下的指标吗?

Lightmannnn commented 1 year ago

是的 而且差距非常大, 不开启蒸馏的时候 100epoch ap已经70%了 现在开启蒸馏 120个epoch ap 20% 我觉得很奇怪,所以来这问问

Chilicyy commented 1 year ago

这边跑的是s网络吗?麻烦发下两次训练的启动命令。

Lightmannnn commented 1 year ago

def get_args_parser(add_help=True): parser = argparse.ArgumentParser(description='YOLOv6 PyTorch Training', add_help=add_help) parser.add_argument('--data-path', default='./data/ALLSC_LFLESS.yaml', type=str, help='path of dataset') parser.add_argument('--conf-file', default='./configs/yolov6s.py', type=str, help='experiments description file') parser.add_argument('--img-size', default=640, type=int, help='train, val image size (pixels)') parser.add_argument('--batch-size', default=32, type=int, help='total batch size for all GPUs') parser.add_argument('--epochs', default=400, type=int, help='number of total epochs to run') parser.add_argument('--workers', default=0, type=int, help='number of data loading workers (default: 8)') parser.add_argument('--device', default='0', type=str, help='cuda device, i.e. 0 or 0,1,2,3 or cpu') parser.add_argument('--eval-interval', default=20, type=int, help='evaluate at every interval epochs') parser.add_argument('--eval-final-only', action='store_true', help='only evaluate at the final epoch') parser.add_argument('--heavy-eval-range', default=50, type=int, help='evaluating every epoch for last such epochs (can be jointly used with --eval-interval)') parser.add_argument('--check-images', action='store_true', help='check images when initializing datasets') parser.add_argument('--check-labels', action='store_true', help='check label files when initializing datasets') parser.add_argument('--output-dir', default='./runs/train', type=str, help='path to save outputs') parser.add_argument('--name', default='yolo6s_distill', type=str, help='experiment name, saved to output_dir/name') parser.add_argument('--dist_url', default='env://', type=str, help='url used to set up distributed training') parser.add_argument('--gpu_count', type=int, default=0) parser.add_argument('--local_rank', type=int, default=-1, help='DDP parameter') parser.add_argument('--resume', nargs='?', const=True, default=False, help='resume the most recent training') parser.add_argument('--write_trainbatch_tb', action='store_true', help='write train_batch image to tensorboard once an epoch, may slightly slower train speed if open') parser.add_argument('--stop_aug_last_n_epoch', default=15, type=int, help='stop strong aug at last n epoch, neg value not stop, default 15') parser.add_argument('--save_ckpt_on_last_n_epoch', default=-1, type=int, help='save last n epoch even not best or last, neg value not save') parser.add_argument('--distill', action='store_false', help='distill or not') parser.add_argument('--distill_feat', action='store_false', help='distill featmap or not') parser.add_argument('--quant', action='store_true', help='quant or not') parser.add_argument('--calib', action='store_false', help='run ptq') parser.add_argument('--teacher_model_path', type=str, default='./weights/pcbs.pt', help='teacher model path') parser.add_argument('--temperature', type=int, default=20, help='distill temperature') parser.add_argument('--fuse_ab', action='store_true', help='fuse ab branch in training process or not') parser.add_argument('--bs_per_gpu', default=32, type=int, help='batch size per GPU for auto-rescale learning rate, set to 16 for P6 models') return parser 麻烦您帮我看一看

Lightmannnn commented 1 year ago

def get_args_parser(add_help=True): parser = argparse.ArgumentParser(description='YOLOv6 PyTorch Training', add_help=add_help) parser.add_argument('--data-path', default='./data/ALLSC_LFLESS.yaml', type=str, help='path of dataset') parser.add_argument('--conf-file', default='./configs/yolov6s.py', type=str, help='experiments description file') parser.add_argument('--img-size', default=640, type=int, help='train, val image size (pixels)') parser.add_argument('--batch-size', default=32, type=int, help='total batch size for all GPUs') parser.add_argument('--epochs', default=400, type=int, help='number of total epochs to run') parser.add_argument('--workers', default=0, type=int, help='number of data loading workers (default: 8)') parser.add_argument('--device', default='0', type=str, help='cuda device, i.e. 0 or 0,1,2,3 or cpu') parser.add_argument('--eval-interval', default=20, type=int, help='evaluate at every interval epochs') parser.add_argument('--eval-final-only', action='store_true', help='only evaluate at the final epoch') parser.add_argument('--heavy-eval-range', default=50, type=int, help='evaluating every epoch for last such epochs (can be jointly used with --eval-interval)') parser.add_argument('--check-images', action='store_true', help='check images when initializing datasets') parser.add_argument('--check-labels', action='store_true', help='check label files when initializing datasets') parser.add_argument('--output-dir', default='./runs/train', type=str, help='path to save outputs') parser.add_argument('--name', default='yolo6s_distill', type=str, help='experiment name, saved to output_dir/name') parser.add_argument('--dist_url', default='env://', type=str, help='url used to set up distributed training') parser.add_argument('--gpu_count', type=int, default=0) parser.add_argument('--local_rank', type=int, default=-1, help='DDP parameter') parser.add_argument('--resume', nargs='?', const=True, default=False, help='resume the most recent training') parser.add_argument('--write_trainbatch_tb', action='store_true', help='write train_batch image to tensorboard once an epoch, may slightly slower train speed if open') parser.add_argument('--stop_aug_last_n_epoch', default=15, type=int, help='stop strong aug at last n epoch, neg value not stop, default 15') parser.add_argument('--save_ckpt_on_last_n_epoch', default=-1, type=int, help='save last n epoch even not best or last, neg value not save') parser.add_argument('--distill', action='store_true', help='distill or not') parser.add_argument('--distill_feat', action='store_true', help='distill featmap or not') parser.add_argument('--quant', action='store_true', help='quant or not') parser.add_argument('--calib', action='store_true', help='run ptq') parser.add_argument('--teacher_model_path', type=str, default=None help='teacher model path') parser.add_argument('--temperature', type=int, default=20, help='distill temperature') parser.add_argument('--fuse_ab', action='store_true', help='fuse ab branch in training process or not') parser.add_argument('--bs_per_gpu', default=32, type=int, help='batch size per GPU for auto-rescale learning rate, set to 16 for P6 models') return parser

这是没开蒸馏的参数 ,上面那个是开启蒸馏的参数,谢谢您

Lightmannnn commented 1 year ago

cwd_loss 非常大 从最开始的120 降到现在的 28

Chilicyy commented 1 year ago

常规模型的蒸馏默认没有开启distill_feat,cwd_loss应该为0。麻烦参考下我们提供的训练启动命令重新训练一下。 https://github.com/meituan/YOLOv6/blob/main/docs/Train_coco_data.md https://github.com/meituan/YOLOv6/blob/main/docs/Train_custom_data.md

Lightmannnn commented 1 year ago

好的, 我重新试试, 谢谢您了

whale1008 commented 1 year ago

您这边再次尝试效果怎么样呢,我蒸馏效果还是不理想

whale1008 commented 1 year ago

常规模型的蒸馏默认没有开启distill_feat,cwd_loss应该为0。麻烦参考下我们提供的训练启动命令重新训练一下。 https://github.com/meituan/YOLOv6/blob/main/docs/Train_coco_data.md https://github.com/meituan/YOLOv6/blob/main/docs/Train_custom_data.md

image config下的模型是coco预训练模型,teacher模型是之前训练80epoch的模型

Lightmannnn commented 1 year ago

您这边再次尝试效果怎么样呢,我蒸馏效果还是不理想

我也不太行,精度没有不开知识蒸馏高

Lightmannnn commented 1 year ago

常规模型的蒸馏默认没有开启distill_feat,cwd_loss应该为0。麻烦参考下我们提供的训练启动命令重新训练一下。 https://github.com/meituan/YOLOv6/blob/main/docs/Train_coco_data.md https://github.com/meituan/YOLOv6/blob/main/docs/Train_custom_data.md

image config下的模型是coco预训练模型,teacher模型是之前训练80epoch的模型

对 这个配置没错 我也是这么搞的 , 但是效果确实看不出来

whale1008 commented 1 year ago

常规模型的蒸馏默认没有开启distill_feat,cwd_loss应该为0。麻烦参考下我们提供的训练启动命令重新训练一下。 https://github.com/meituan/YOLOv6/blob/main/docs/Train_coco_data.md https://github.com/meituan/YOLOv6/blob/main/docs/Train_custom_data.md

image config下的模型是coco预训练模型,teacher模型是之前训练80epoch的模型

对 这个配置没错 我也是这么搞的 , 但是效果确实看不出来

因为考虑到官方模型都是coco数据集,我也试过将config下配置成已训练好的模型,即config下和teacher_model_path模型一致,但训练时loss不断变大,2个epoch左右就会断掉。 image

whale1008 commented 1 year ago

常规模型的蒸馏默认没有开启distill_feat,cwd_loss应该为0。麻烦参考下我们提供的训练启动命令重新训练一下。 https://github.com/meituan/YOLOv6/blob/main/docs/Train_coco_data.md https://github.com/meituan/YOLOv6/blob/main/docs/Train_custom_data.md 因为考虑到官方模型都是coco数据集,我也试过将config下配置成已训练好的模型,即config下和teacher_model_path模型一致,但训练时loss不断变大,2个epoch左右就会断掉。 image

barbecacov commented 1 year ago

应该是config里不用写预训练权重吧?