glenn-jocher commented 3 years ago

We've done a bit of experimentation with adding an additional large object output layer P6, following the EfficientDet example of increasing output layers for larger models, except in our case applying it to all models. The current models have P3 (stride 8, small) to P5 (stride 32, large) outputs. The P6 output layer is stride 64 and intended for extra-large objects. It seems to help normal COCO training at --img 640, and we've also gotten good results training at --img 1280.

The architecture changes we've made to add a P6 layer are here. The backbone is extended down to P6, and the PANet head goes to down to P3 (like usual) and back up to P6 now instead of stopping at P5. New anchors were also added, evolved at --img 1280.

The chart below shows the current YOLOv5l 4.0 model in blue, and the new YOLOv5l6 architecture in green and orange. The green and orange lines correspond to the same architecture but trained at --img 640 (green) or --img 1280 (orange). The points plotted are the evaluations of each model along an image space vector from 384 to 1536 in steps of 128, with results plotted up to the max mAP point. Code to reproduce is python test.py --task study following changes from PR #https://github.com/ultralytics/yolov5/pull/2099.

study-yolov5l

Conclusion is that the P6 models increase performance on COCO under all scenarios we tested, though they are also a bit slower (maybe 10% slower), larger (50% larger), with only slightly more FLOPS though, so training time and CUDA memory requirements are almost the same as the P5 models. We are doing some more studies to see if these might be suitable for replacement of the current models. Let me know if you have any thoughts. For the time being, these models are available for auto-download just like the regular models, i.e. python detect.py --weights yolov5s6.pt.

laurenssam commented 3 years ago

Hello Glenn,

I'm currently working with images with high resolution (8000 x 4000) and I don't want to downsample the images too far, since the model might not be able to detect objects far away. So, it would be super nice If we could use a pretrained model on high resolution, for example the 1280, you're talking about. Is it possible to download the weights of these models? I was also curious if you have models on the higher resolution without the additional output layer, I don't think I necessarily need that.

Thanks!

WANGCHAO1996 commented 3 years ago

Hello Glenn, if the target of the data set is small, can I output only P2\P3\P4 How to modify the YAML file? Thank you very much! @glenn-jocher

glenn-jocher commented 3 years ago

@laurenssam yes the P6 models can be manually downloaded here: https://github.com/ultralytics/yolov5/releases/tag/v4.0

Any model in the latest release assets can also be auto-downloaded simply by asking for it in a command, i.e.:

python detect.py --weights yolov5s6.pt  # auto-download YOLOv5s P6 model

glenn-jocher commented 3 years ago

@WANGCHAO1996 yes the YAML files are easy to modify. You can remove (and add) outputs by simply changing the Detect() module inputs here. Inputs 17, 20 and 23 correspond to P3, P4 and P5 grids. Remember if you modify the number of layers here you should also modify your anchors correspondingly, or simply delete the anchors and replace them with anchors: 3 (to tell autoanchor to compute 3 anchors for each output layer). https://github.com/ultralytics/yolov5/blob/73a066993051339f6adfe5095a7852a2b9184c16/models/yolov5s.yaml#L47

laurenssam commented 3 years ago

Thanks, but how do I know the resolution the model was trained on?

glenn-jocher commented 3 years ago

@laurenssam the P6 models were trained on 1280 by default, except for the ones denoted by -640 (to get an apples to apples comparison with the current models).

For training on large images, you can see our xview repo here. The general concept is to train on smaller 'chips' at native resolution, and then run inference either at native resolution if you can, or else use a sliding window that's stitched togethor later. https://github.com/ultralytics/xview-yolov3

WANGCHAO1996 commented 3 years ago

@WANGCHAO1996 yes the YAML files are easy to modify. You can remove (and add) outputs by simply changing the Detect() module inputs here. Inputs 17, 20 and 23 correspond to P3, P4 and P5 grids. Remember if you modify the number of layers here you should also modify your anchors correspondingly, or simply delete the anchors and replace them with anchors: 3 (to tell autoanchor to compute 3 anchors for each output layer).

https://github.com/ultralytics/yolov5/blob/73a066993051339f6adfe5095a7852a2b9184c16/models/yolov5s.yaml#L47

parameters

nc: 1 # number of classes depth_multiple: 1.33 # model depth multiple width_multiple: 1.25 # layer channel multiple

anchors

anchors: 3

YOLOv5 backbone

backbone:

[from, number, module, args]

[ [ -1, 1, Focus, [ 64, 3 ] ], # 0-P1/2 [ -1, 1, Conv, [ 128, 3, 2 ] ], # 1-P2/4 [ -1, 3, C3, [ 128 ] ], [ -1, 1, Conv, [ 256, 3, 2 ] ], # 3-P3/8 [ -1, 9, C3, [ 256 ] ], [ -1, 1, Conv, [ 512, 3, 2 ] ], # 5-P4/16 [ -1, 9, C3, [ 512 ] ], [ -1, 1, Conv, [ 1024, 3, 2 ] ], # 7-P5/32 [ -1, 1, SPP, [ 1024, [ 5, 9, 13 ] ] ], [ -1, 3, C3, [ 1024, False ] ], # 9 ]

YOLOv5 head

head: [ [ -1, 1, Conv, [ 512, 1, 1 ] ], [ -1, 1, nn.Upsample, [ None, 2, 'nearest' ] ], [ [ -1, 6 ], 1, Concat, [ 1 ] ], # cat backbone P4 [ -1, 3, C3, [ 512, False ] ], # 13

[ -1, 1, Conv, [ 256, 1, 1 ] ],
[ -1, 1, nn.Upsample, [ None, 2, 'nearest' ] ],
[ [ -1, 4 ], 1, Concat, [ 1 ] ],  # cat backbone P3
[ -1, 3, C3, [ 256, False ] ],  # 17 (P3/8-small)

[ -1, 1, Conv, [ 128, 1, 1 ] ],
[ -1, 1, nn.Upsample, [ None, 2, 'nearest' ] ],
[ [ -1, 2 ], 1, Concat, [ 1 ] ],  # cat backbone P2
[ -1, 1, C3, [ 128, False ] ],  # 21 (P2/4-xsmall)

[ -1, 1, Conv, [ 128, 3, 2 ] ],
[ [ -1, 18 ], 1, Concat, [ 1 ] ],  # cat head P3
[ -1, 3, C3, [ 256, False ] ],  # 24 (P3/8-small)

[ -1, 1, Conv, [ 256, 3, 2 ] ],
[ [ -1, 14 ], 1, Concat, [ 1 ] ],  # cat head P4
[ -1, 3, C3, [ 512, False ] ],  # 27 (P4/16-medium)

[ -1, 1, Conv, [ 512, 3, 2 ] ],
[ [ -1, 10 ], 1, Concat, [ 1 ] ],  # cat head P5
[ -1, 3, C3, [ 1024, False ] ],  # 30 (P5/32-large)

[ [ 21, 24, 27 ], 1, Detect, [ nc, anchors ] ],  # Detect(P2, P3, P4)

] Thank you very much! Is that right? @glenn-jocher

glenn-jocher commented 3 years ago

@WANGCHAO1996 I don't know what you are asking and your post is poorly formatted. Use ``` for code sections.

github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Shaotran commented 3 years ago

Hi Glenn -- two questions for you: I have the x6.pt weights file downloaded, but is there a --cfg yolov5x6.yaml model file to go along with it for training? Or is it still supposed to go with yolov5x.yaml?

You mention that the x6 model adds an additional layer "for extra-large objects," but also that it seems to perform better than the normal x model "under all conditions." Of course, I'll have to experiment, but just to understand your underlying intentions while creating the model -- would x6 still work for large images (1000px+) with "small" objects to detect (~100px+)? Thanks!

glenn-jocher commented 3 years ago

@Shaotran all models contain their yaml files as attributes:

model = torch.load(...
model.yaml

The P6 models perform better on COCO than their P5 counterparts.

github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

ThuyHoang9001 commented 3 years ago

@glenn-jocher Hello Jocher, in the case that I want to convert to trt ,I wonder what is size of input( 640 or 1280) should be used for detecting yolov5s6 model?

glenn-jocher commented 3 years ago

@ThuyHoang9001 P6 models will get best results at --img 1280, and can also be used all the way down to --img 64 with decreasing levels of accuracy.

ultralytics / yolov5

YOLOv5 P6 Models 😃 #2110

parameters

anchors

YOLOv5 backbone

[from, number, module, args]

YOLOv5 head