ultralytics / yolov5

YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
50.9k stars 16.39k forks source link

Focus #847

Closed ycdhqzhiai closed 4 years ago

ycdhqzhiai commented 4 years ago

❔Question

I'd like to know how much effect the removal of Focus structure will have on map. Has anyone tried it

Additional context

github-actions[bot] commented 4 years ago

Hello @ycdhqzhiai, thank you for your interest in our work! Please visit our Custom Training Tutorial to get started, and see our Jupyter Notebook Open In Colab, Docker Image, and Google Cloud Quickstart Guide for example environments.

If this is a bug report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom model or data training question, please note Ultralytics does not provide free personal support. As a leader in vision ML and AI, we do offer professional consulting, from simple expert advice up to delivery of fully customized, end-to-end production solutions for our clients, such as:

For more information please visit https://www.ultralytics.com.

Aktcob commented 4 years ago

+1

glenn-jocher commented 4 years ago

Focus() module is designed for FLOPS reduction and speed increase, not mAP increase.

glenn-jocher commented 4 years ago

Also designed for layer count reduction. 1 Focus module replaces 3 yolov3/4 layers.

captainst commented 3 years ago

I have a question concerning the focus layer: while it reduces the computation cost, does the "stride" sampling mess the coordinates of the ground truth bounding box ?

glenn-jocher commented 3 years ago

@captainst Focus (and it's new v6.0 replacement) stack spatial information into channel space, so it's possible 1 pixel's worth of regression information may be reduced, though for most use cases regression accuracies will never approach 1 pixel anyway.

captainst commented 3 years ago

@glenn-jocher Many thanks. Now I appreciate why the focus layer is appied only once on the raw input image in the very beginning of the pipeline.

glenn-jocher commented 3 years ago

@captainst yes, for classification models it might be more usable later on in the backbone, but probably not for detection models without losing mAP@0.5:0.95 (mAP@0.5 would be less affected).