ultralytics / yolov5

YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
50.79k stars 16.36k forks source link

Adding connection in architecture between backbone and multi-stage head #6174

Closed Michelvl92 closed 2 years ago

Michelvl92 commented 2 years ago

Search before asking

Question

According to Improving YOLOv5 with Attention Mechanism for Detecting Boulders from Planetary Images they add up to 4 connections between the backbone and neck as show in the fig below (forget that they add a 4th detection output).

They claim the following: four connections represented by the red lines are added to bring the feature information from the backbone network (152 × 152 pixels, 76 × 76 pixels, 38 × 38 pixels, 19 × 19 pixels) into the feature fusion layers in the neck network. Based on the idea of residual networks, these connections can enhance the backpropagation of gradients, avoid gradient fading, and reduce the loss of the feature information of small objects.

What are your thoughts on this improvement? yolo_custom

If you think this can be an improvement is the below YAML file correct? Os should I add been channel reduction (CONV 1x1) or set the channel outputs bigger?

Additional

anchors:
  - [10,13, 16,30, 33,23]  # P2/4
  - [10,13, 16,30, 33,23]  # P3/8
  - [30,61, 62,45, 59,119]  # P4/16
  - [116,90, 156,198, 373,326]  # P5/32

# YOLOv5 v6.0 backbone
backbone:
  # [from, number, module, args]
  [[-1, 1, Conv, [64, 6, 2, 2]],  # 0-P1/2
   [-1, 1, Conv, [128, 3, 2]],  # 1-P2/4
   [-1, 3, C3, [128]],
   [-1, 1, Conv, [256, 3, 2]],  # 3-P3/8
   [-1, 6, C3, [256]],
   [-1, 1, Conv, [512, 3, 2]],  # 5-P4/16
   [-1, 9, C3, [512]],
   [-1, 1, Conv, [1024, 3, 2]],  # 7-P5/32
   [-1, 3, C3, [1024]],
   [-1, 1, SPPF, [1024, 5]],  # 9
  ]

# YOLOv5 v6.0 head
head:
  [[-1, 1, Conv, [512, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 6], 1, Concat, [1]],  # cat backbone P4
   [-1, 3, C3, [512, False]],  # 13

   [-1, 1, Conv, [256, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 4], 1, Concat, [1]],  # cat backbone P3
   [-1, 3, C3, [256, False]],  # 17 (P3/8-small)

   [-1, 1, Conv, [128, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 2], 1, Concat, [1]],  # cat backbone P2
   [-1, 3, C3, [128, False]],  # 21 (P2/4-tiny)

   [-1, 1, Conv, [128, 3, 2]],
   [[-1, 18, 4], 1, Concat, [1]],  # cat head P4 # cat with layer 4? or should we first reduce channels?
   [-1, 3, C3, [256, False]],  # 24 (P4/16-medium)

   [-1, 1, Conv, [256, 3, 2]],
   [[-1, 14, 6], 1, Concat, [1]],  # cat head P4 # cat with layer 6? or should we first reduce channels?
   [-1, 3, C3, [512, False]],  # 27 (P4/16-medium)

   [-1, 1, Conv, [512, 3, 2]],
   [[-1, 10, 8], 1, Concat, [1]],  # cat head P5 # cat with layer 8? or should we first reduce channels?
   [-1, 3, C3, [1024, False]],  # 30 (P5/32-large)

   [[21, 24, 27, 30], 1, Detect, [nc, anchors]],  # Detect(P2, P3, P4, P5)
  ]
Michelvl92 commented 2 years ago

Furthermore, they also placed a CBL (Conv-Bn-Leakyrelu) before SPP and a C3 module after SPP. This looks similar on #6006 FPN_conv. And I have seen this architecture in many other papers, or was this previously the default yoloV5 architecture?

glenn-jocher commented 2 years ago

@Michelvl92 anything with LeakyReLU is old. We started YOLOv5 with this activation in v1.0 and migrated away to hardswish and then later Swish.

Residual connections are pretty basic, we already have several of them as you can see in the Concat layers. You're free to add more if you'd like. Concat modules will concat any layers with matching HW dimensions regardless of channel count. If you want the connections to bring similar amounts of information then they should naturally contain similar channel counts though.

github-actions[bot] commented 2 years ago

👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.

Access additional YOLOv5 🚀 resources:

Access additional Ultralytics ⚡ resources:

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐!