meituan / YOLOv6

YOLOv6: a single-stage object detection framework dedicated to industrial applications.
GNU General Public License v3.0
5.72k stars 1.03k forks source link

Reproduction of Repopt on yolo-nano with batchsize 256 on 8 gpus ? #449

Closed yanghu819 closed 2 years ago

yanghu819 commented 2 years ago

The reproduction results are far lower than the reported

hs

python -m torch.distributed.launch --nproc_per_node 8 tools/train.py --batch 256 --conf configs/repopt/yolov6n_hs.py --data data/coco_run.yaml --device 0,1,2,3,4,5,6,7 \ --name repopt_n_hs --workers 16

report

python -m torch.distributed.launch --nproc_per_node 8 tools/train.py --batch 256 --conf configs/repopt/yolov6n_opt.py --data data/coco_run.yaml --device 0,1,2,3,4,5,6,7 \ --name repopt_n_opt

I notice that your repopt process both use batch size 32, did you try batch 256 , for the results is low as

Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.348 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.526 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.369 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.172 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.382 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.481 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.300 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.496 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.541 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.342 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.600 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.716 Epoch: 399 | mAP@0.5: 0.5263928481812168 | mAP@0.50:0.95: 0.3479910307824734

xingyueye commented 2 years ago

@yanghu819 Hi, we used to train a repopt version yolov6_nano model for 300 epochs and get the mAP@0.50:0.95 of about 35.3, nearly close to the released results (35.6). Notice that we use the 'last_ckpt.pt' of HS as scale model for repopt training, and 'stop_aug_last_n_epoch' is 15. BTW, We have not produce repopt_version experiments for 400 epochs, maybe some hyper-parameters need to be adjust carefully.