请问要使用自蒸馏teacher model path要设置一个什么样的预训练权重，

Lightmannnn commented 1 year ago

Before Asking

[X] I have read the README carefully. 我已经仔细阅读了README上的操作指引。
[X] I want to train my custom dataset, and I have read the tutorials for training your custom data carefully and organize my dataset correctly; (FYI: We recommand you to apply the config files of xx_finetune.py.) 我想训练自定义数据集，我已经仔细阅读了训练自定义数据的教程，以及按照正确的目录结构存放数据集。（FYI: 我们推荐使用xx_finetune.py等配置文件训练自定义数据集。）
[X] I have pulled the latest code of main branch to run again and the problem still existed. 我已经拉取了主分支上最新的代码，重新运行之后，问题仍不能解决。

Search before asking

[X] I have searched the YOLOv6 issues and found no similar questions.

Question

我最开始没有开启自蒸馏训练，跑了200个epoch，得到了一个效果还不错的权重，现在想要开启自蒸馏，我要使用已经在这个数据集上效果非常好的权重文件，还是GitHub上给的预训练权重文件。

Additional

No response

Chilicyy commented 1 year ago

config文件中预训练权重路径指定 github上提供的COCO预训练权重，启动自蒸馏训练的命令行中添加--distill --teacher_model_path xxx(这里指定您已训好的模型权重路径)。具体可参考这里https://github.com/meituan/YOLOv6/blob/main/docs/Train_coco_data.md

Lightmannnn commented 1 year ago

config文件中预训练权重路径指定 github上提供的COCO预训练权重，启动自蒸馏训练的命令行中添加--distill --teacher_model_path xxx(这里指定您已训好的模型权重路径)。具体可参考这里https://github.com/meituan/YOLOv6/blob/main/docs/Train_coco_data.md

网站打不开了，我叙述一下我理解的意思，您看看对不对，就是我如果要使用yolov6s_finetune.py，我就去下载coco的预训练权重，我如果要使用自蒸馏，我就要用我训好的模型权重（200轮epoch的）作为教师模型的权重

Chilicyy commented 1 year ago

对的，现在应该可以打开了。预训练权重路径在config文件中指定，蒸馏采用的教师模型权重路径在训练启动命令中指定。

Lightmannnn commented 1 year ago

好的，我现在就是这么跑的，现在已经跑了100多个epoch了，但是ap和map还没有不开启自蒸馏的时候高，这种情况正常么？

Chilicyy commented 1 year ago

ap和map还没有不开启自蒸馏的时候高 对比的是相同epoch下的指标吗？

Lightmannnn commented 1 year ago

是的而且差距非常大，不开启蒸馏的时候 100epoch ap已经70%了现在开启蒸馏 120个epoch ap 20% 我觉得很奇怪，所以来这问问

Chilicyy commented 1 year ago

这边跑的是s网络吗？麻烦发下两次训练的启动命令。

Lightmannnn commented 1 year ago

def get_args_parser(add_help=True): parser = argparse.ArgumentParser(description='YOLOv6 PyTorch Training', add_help=add_help) parser.add_argument('--data-path', default='./data/ALLSC_LFLESS.yaml', type=str, help='path of dataset') parser.add_argument('--conf-file', default='./configs/yolov6s.py', type=str, help='experiments description file') parser.add_argument('--img-size', default=640, type=int, help='train, val image size (pixels)') parser.add_argument('--batch-size', default=32, type=int, help='total batch size for all GPUs') parser.add_argument('--epochs', default=400, type=int, help='number of total epochs to run') parser.add_argument('--workers', default=0, type=int, help='number of data loading workers (default: 8)') parser.add_argument('--device', default='0', type=str, help='cuda device, i.e. 0 or 0,1,2,3 or cpu') parser.add_argument('--eval-interval', default=20, type=int, help='evaluate at every interval epochs') parser.add_argument('--eval-final-only', action='store_true', help='only evaluate at the final epoch') parser.add_argument('--heavy-eval-range', default=50, type=int, help='evaluating every epoch for last such epochs (can be jointly used with --eval-interval)') parser.add_argument('--check-images', action='store_true', help='check images when initializing datasets') parser.add_argument('--check-labels', action='store_true', help='check label files when initializing datasets') parser.add_argument('--output-dir', default='./runs/train', type=str, help='path to save outputs') parser.add_argument('--name', default='yolo6s_distill', type=str, help='experiment name, saved to output_dir/name') parser.add_argument('--dist_url', default='env://', type=str, help='url used to set up distributed training') parser.add_argument('--gpu_count', type=int, default=0) parser.add_argument('--local_rank', type=int, default=-1, help='DDP parameter') parser.add_argument('--resume', nargs='?', const=True, default=False, help='resume the most recent training') parser.add_argument('--write_trainbatch_tb', action='store_true', help='write train_batch image to tensorboard once an epoch, may slightly slower train speed if open') parser.add_argument('--stop_aug_last_n_epoch', default=15, type=int, help='stop strong aug at last n epoch, neg value not stop, default 15') parser.add_argument('--save_ckpt_on_last_n_epoch', default=-1, type=int, help='save last n epoch even not best or last, neg value not save') parser.add_argument('--distill', action='store_false', help='distill or not') parser.add_argument('--distill_feat', action='store_false', help='distill featmap or not') parser.add_argument('--quant', action='store_true', help='quant or not') parser.add_argument('--calib', action='store_false', help='run ptq') parser.add_argument('--teacher_model_path', type=str, default='./weights/pcbs.pt', help='teacher model path') parser.add_argument('--temperature', type=int, default=20, help='distill temperature') parser.add_argument('--fuse_ab', action='store_true', help='fuse ab branch in training process or not') parser.add_argument('--bs_per_gpu', default=32, type=int, help='batch size per GPU for auto-rescale learning rate, set to 16 for P6 models') return parser 麻烦您帮我看一看

Lightmannnn commented 1 year ago

def get_args_parser(add_help=True): parser = argparse.ArgumentParser(description='YOLOv6 PyTorch Training', add_help=add_help) parser.add_argument('--data-path', default='./data/ALLSC_LFLESS.yaml', type=str, help='path of dataset') parser.add_argument('--conf-file', default='./configs/yolov6s.py', type=str, help='experiments description file') parser.add_argument('--img-size', default=640, type=int, help='train, val image size (pixels)') parser.add_argument('--batch-size', default=32, type=int, help='total batch size for all GPUs') parser.add_argument('--epochs', default=400, type=int, help='number of total epochs to run') parser.add_argument('--workers', default=0, type=int, help='number of data loading workers (default: 8)') parser.add_argument('--device', default='0', type=str, help='cuda device, i.e. 0 or 0,1,2,3 or cpu') parser.add_argument('--eval-interval', default=20, type=int, help='evaluate at every interval epochs') parser.add_argument('--eval-final-only', action='store_true', help='only evaluate at the final epoch') parser.add_argument('--heavy-eval-range', default=50, type=int, help='evaluating every epoch for last such epochs (can be jointly used with --eval-interval)') parser.add_argument('--check-images', action='store_true', help='check images when initializing datasets') parser.add_argument('--check-labels', action='store_true', help='check label files when initializing datasets') parser.add_argument('--output-dir', default='./runs/train', type=str, help='path to save outputs') parser.add_argument('--name', default='yolo6s_distill', type=str, help='experiment name, saved to output_dir/name') parser.add_argument('--dist_url', default='env://', type=str, help='url used to set up distributed training') parser.add_argument('--gpu_count', type=int, default=0) parser.add_argument('--local_rank', type=int, default=-1, help='DDP parameter') parser.add_argument('--resume', nargs='?', const=True, default=False, help='resume the most recent training') parser.add_argument('--write_trainbatch_tb', action='store_true', help='write train_batch image to tensorboard once an epoch, may slightly slower train speed if open') parser.add_argument('--stop_aug_last_n_epoch', default=15, type=int, help='stop strong aug at last n epoch, neg value not stop, default 15') parser.add_argument('--save_ckpt_on_last_n_epoch', default=-1, type=int, help='save last n epoch even not best or last, neg value not save') parser.add_argument('--distill', action='store_true', help='distill or not') parser.add_argument('--distill_feat', action='store_true', help='distill featmap or not') parser.add_argument('--quant', action='store_true', help='quant or not') parser.add_argument('--calib', action='store_true', help='run ptq') parser.add_argument('--teacher_model_path', type=str, default=None help='teacher model path') parser.add_argument('--temperature', type=int, default=20, help='distill temperature') parser.add_argument('--fuse_ab', action='store_true', help='fuse ab branch in training process or not') parser.add_argument('--bs_per_gpu', default=32, type=int, help='batch size per GPU for auto-rescale learning rate, set to 16 for P6 models') return parser

这是没开蒸馏的参数，上面那个是开启蒸馏的参数，谢谢您

Lightmannnn commented 1 year ago

cwd_loss 非常大从最开始的120 降到现在的 28

Chilicyy commented 1 year ago

常规模型的蒸馏默认没有开启distill_feat，cwd_loss应该为0。麻烦参考下我们提供的训练启动命令重新训练一下。 https://github.com/meituan/YOLOv6/blob/main/docs/Train_coco_data.md https://github.com/meituan/YOLOv6/blob/main/docs/Train_custom_data.md

Lightmannnn commented 1 year ago

好的，我重新试试，谢谢您了

whale1008 commented 1 year ago

您这边再次尝试效果怎么样呢，我蒸馏效果还是不理想

whale1008 commented 1 year ago

常规模型的蒸馏默认没有开启distill_feat，cwd_loss应该为0。麻烦参考下我们提供的训练启动命令重新训练一下。 https://github.com/meituan/YOLOv6/blob/main/docs/Train_coco_data.md https://github.com/meituan/YOLOv6/blob/main/docs/Train_custom_data.md

config下的模型是coco预训练模型，teacher模型是之前训练80epoch的模型

Lightmannnn commented 1 year ago

您这边再次尝试效果怎么样呢，我蒸馏效果还是不理想

我也不太行，精度没有不开知识蒸馏高

Lightmannnn commented 1 year ago

常规模型的蒸馏默认没有开启distill_feat，cwd_loss应该为0。麻烦参考下我们提供的训练启动命令重新训练一下。 https://github.com/meituan/YOLOv6/blob/main/docs/Train_coco_data.md https://github.com/meituan/YOLOv6/blob/main/docs/Train_custom_data.md

config下的模型是coco预训练模型，teacher模型是之前训练80epoch的模型

对这个配置没错我也是这么搞的，但是效果确实看不出来

whale1008 commented 1 year ago

常规模型的蒸馏默认没有开启distill_feat，cwd_loss应该为0。麻烦参考下我们提供的训练启动命令重新训练一下。 https://github.com/meituan/YOLOv6/blob/main/docs/Train_coco_data.md https://github.com/meituan/YOLOv6/blob/main/docs/Train_custom_data.md

config下的模型是coco预训练模型，teacher模型是之前训练80epoch的模型

对这个配置没错我也是这么搞的，但是效果确实看不出来

因为考虑到官方模型都是coco数据集，我也试过将config下配置成已训练好的模型，即config下和teacher_model_path模型一致，但训练时loss不断变大，2个epoch左右就会断掉。

whale1008 commented 1 year ago

常规模型的蒸馏默认没有开启distill_feat，cwd_loss应该为0。麻烦参考下我们提供的训练启动命令重新训练一下。 https://github.com/meituan/YOLOv6/blob/main/docs/Train_coco_data.md https://github.com/meituan/YOLOv6/blob/main/docs/Train_custom_data.md 因为考虑到官方模型都是coco数据集，我也试过将config下配置成已训练好的模型，即config下和teacher_model_path模型一致，但训练时loss不断变大，2个epoch左右就会断掉。

barbecacov commented 1 year ago

应该是config里不用写预训练权重吧？

meituan / YOLOv6