Add ASFF (three fuse feature layers) int the Head for V5(s,m,l,x)

positive666 commented 3 years ago

🚀 Feature

Add ASFF fuse feature layers to the Head : the level1-level 3 scale maps are respectively fused into 3 corresponding scale feature maps, and the fusion weights are adaptively adjusted.

Motivation

Refer to the feature fusion case of yolov3_asff. paper
Add optional four yolov5_asff models structure (in yaml file )
The ASFF method is very suitable for the YOLO series, and through reading the paper, I found that it has a reasonable explanatory nature. It can be incorporated into an alternative structure of V5.

Integrate ASFF functions into the project and hope to make a contribution for yoloV5 project

Pitch

I add ASFFV5 classes at 310 line in https://github.com/positive666/yolov5/blob/master/models/common.py : Add asff layers structure for yolov5(s,m,x,l),Integrated into YOLOV5's code project. and different more than v3_asff and add RFB block.such as, yolov5s.yaml:

head:
[[-1, 1, Conv, [512, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[[-1, 6], 1, Concat, [1]],  # cat backbone P4
[-1, 3, C3, [512, False]],  # 13

[-1, 1, Conv, [256, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[[-1, 4], 1, Concat, [1]],  # cat backbone P3
[-1, 3, C3, [256, False]],  # 17 (P3/8-small)

[-1, 1, Conv, [256, 3, 2]],
[[-1, 14], 1, Concat, [1]],  # cat head P4
[-1, 3, C3, [512, False]],  # 20 (P4/16-medium)

[-1, 1, Conv, [512, 3, 2]],
[[-1, 10], 1, Concat, [1]],  # cat head P5
[-1, 3, C3, [1024, False]],  # 23 (P5/32-large)

[[17,20,23], 1, ASFFV5, [0, 512, 0.5 ]],   
[[17,20,23], 1, ASFFV5, [1, 256, 0.5 ]],   
[[17,20,23], 1, ASFFV5, [2, 128 ,0.5]],  
#[[17, 20, 23], 1, Detect, [nc, anchors]],  # Detect(P3, P4, P5)
[[26, 25, 24], 1, Detect, [nc, anchors]],  # Detect(P3, P4, P5)
]

ASFF Interpretability

The paper also explains why the weight parameter of feature fusion comes from output feature + convolution, because the fusion weight parameter and feature are closely related .

COCO

System	test-dev mAP	Time (V100)	Time (2080ti)
YOLOv3 608	33.0	20ms	26ms
YOLOv3 608+ BoFs	37.0	20ms	26ms
YOLOv3 608 (our baseline)	38.8	20ms	26ms
YOLOv3 608+ ASFF	40.6	22ms	30ms
YOLOv3 608+ ASFF*	42.4	22ms	30ms
YOLOv3 800+ ASFF*	43.9	34ms	38ms
YOLOv3 MobileNetV1 416 + BoFs	28.6	-	22 ms
YOLOv3 MobileNetV2 416 (our baseline)	29.0	-	22 ms
YOLOv3 MobileNetV2 416 +ASFF	30.6	-	24 ms

I also plan to add some other tricks, such as aware IOU, and other transformer idea etc., I will conduct some experiments and changes in the future

github-actions[bot] commented 3 years ago

👋 Hello @positive666, thank you for your interest in 🚀 YOLOv5! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available.

For business inquiries or professional support requests please visit https://www.ultralytics.com or email Glenn Jocher at glenn.jocher@ultralytics.com.

Requirements

Python 3.8 or later with all requirements.txt dependencies installed, including torch>=1.7. To install run:

$ pip install -r requirements.txt

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Google Colab and Kaggle notebooks with free GPU:
Google Cloud Deep Learning VM. See GCP Quickstart Guide
Amazon Deep Learning AMI. See AWS Quickstart Guide
Docker Image. See Docker Quickstart Guide

Status

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), testing (test.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu every 24 hours and on every commit.

cszer commented 3 years ago

Hello , check issues in yolov4 repo , authors of ASFF used all bag of species , and standalone ASFF add only 0.5 MAP

positive666 commented 3 years ago

Hello , check issues in yolov4 repo , authors of ASFF used all bag of species , and standalone ASFF add only 0.5 MAP

I am very happy to receive your reply. Yes, I have verified similar conclusions on some data sets.I want to integrate this module into V5 for the convenience of subsequent research, and my first addition is to add ASFF after PANnet. The output of this ASFFV5 layer is different from V3. I still need to study and understand in the follow-up. I originally wanted to add BIFPN, but I think the increase in the feature layer and the close connection will increase the training time, thank you for your reply。

glenn-jocher commented 3 years ago

@positive666 thanks for the idea! I see you submitted a PR, I will take a look there.

I experimented with ASFF with YOLOv3 before, but had difficulty implementing it as we used to build our pytorch models from the darknet cfg files, which placed the output layers in very different places in the model.

I think now with all the output layers located in the Detect() layer, an ASFF implementation should be a bit easier to do.

positive666 commented 3 years ago

@glenn-jocher Thank you for your reply. Now I'm verifying this on coco. Another question I have is. For example, my first change was that the data set was 5000 cigarettes detect dataset and the training was 300 epoch Map is always 0.7. I didn't add any additional training data set. I just want to verify that the addition of ASFF doesn't improve significantly . One of my thoughts here is that even the same MAP can't guarantee the reasoning performance in the future. Now I add some lightweight modules of attention mechanism, which have not been submitted in the PR， I will continue to do some experiments.

glenn-jocher commented 3 years ago

@positive666 I think what you're mentioning is generalization of your results to the wider world. Typically this is why COCO is used a benchmark, as it overlaps many common usecases. It takes a long time to train though, so if you want to prototype results quickly I would recommend VOC, which still generalizes somewhat, but is much smaller and faster to train. You can train VOC in Colab in less than a day, especially the smaller models:

https://colab.research.google.com/github/ultralytics/yolov5/blob/master/tutorial.ipynb?hl=en#scrollTo=BSgFCAcMbk1R

# VOC
for b, m in zip([64, 48, 32, 16], ['yolov5s', 'yolov5m', 'yolov5l', 'yolov5x']):  # zip(batch_size, model)
  !python train.py --batch {b} --weights {m}.pt --data voc.yaml --epochs 50 --cache --img 512 --nosave --hyp hyp.finetune.yaml --project VOC --name {m}

cszer commented 3 years ago

I have tried your modification, with my modifications that aims to small objects detection. I have achived 30.5 0.5:0.95 small map (Coco) with 75 Gflops(this module adds 20 Gflops) , but ablation is needed to verify impact

cszer commented 3 years ago

I have done ablation , this module is useless , adds only 0.4 map to 0.5:0.95 small map for 20 Gflops

cszer commented 3 years ago

I think now best target to study - convolutions to involutions replacement

glenn-jocher commented 3 years ago

@cszer involutions?

cszer commented 3 years ago

@cszer involutions?

Yes, check this paper https://arxiv.org/abs/2103.06255

glenn-jocher commented 3 years ago

@cszer wow! Just out yesterday. Thanks for the link.

cszer commented 3 years ago

@cszer wow! Just out yesterday. Thanks for the link.

10 telegram channels help me a lot))

glenn-jocher commented 3 years ago

@cszer what 10 telegram channels?

Paper seems interesting, a nice bridge between attention (across channels) and convolutions (across image space). AP increase is slight, but it's also accompanied by slight size and FLOPS reductions. https://github.com/d-li14/involution#object-detection-and-instance-segmentation-on-coco

glenn-jocher commented 3 years ago

@cszer I've raised issue #1 on the involutions repo (yay): https://github.com/d-li14/involution/issues/1

The straightforward implementation seems to be to use this involution() module here, replacing the MMDetection Conv modules with the local YOLOv5 Conv() module: https://github.com/d-li14/involution/blob/main/det/mmdet/models/utils/involution_naive.py

glenn-jocher commented 3 years ago

@cszer I've created an Involution PR https://github.com/ultralytics/yolov5/pull/2435 to experiment.

positive666 commented 3 years ago

I have tried your modification, with my modifications that aims to small objects detection. I have achived 30.5 0.5:0.95 small map (Coco) with 75 Gflops(this module adds 20 Gflops) , but ablation is needed to verify impact

I have done ablation , this module is useless , adds only 0.4 map to 0.5:0.95 small map for 20 Gflops

positive666 commented 3 years ago

I have done ablation , this module is useless , adds only 0.4 map to 0.5:0.95 small map for 20 Gflops @glenn-jocher @cszer ,Hello, I have trained the v5 small scale on VOC before and did some related ablation comparison experiments, and the improvement on the AP of the test set is indeed not big (adding CBAM separately without pre-training weights, on the test set of VOC2007 , Using already trained yolov5:

cbam_v5s mAP@: 0.56 mAP, @.5:.95: 0.3, 16.6 Gflops; (without loading weights)

asff_v5s, mAP@: 0.56 mAP, @.5:.95: 0.38, 20 Gflops; But I feel that my own experiments on V5s are not sufficient, and the current simple experiments cannot explain the failure of the attention mechanism. I have been busy recently. I will continue to complete the verification, but I added ASFF and CBAM to do it once. Simple ablation. This attempt has caused me some exploration and thinking. I started to pay attention to some of the difficulties in anchor target detection: the introduction of positive samples and the existence of independent and mutual interference between classification and regression. My thoughts It is about detecting the weak correlation between classification and regression. I plan to use these attention mechanisms to improve the LOSS of classification and regression, such as Aware-IOU. Thank you for your feedback.

glenn-jocher commented 3 years ago

@positive666 those mAPs seem pretty low, the baseline VOC training script (below) will train YOLOv5s to about 0.85 mAP@0.5 (and YOLOv5x to about 0.92mAP@0.5):

https://colab.research.google.com/github/ultralytics/yolov5/blob/master/tutorial.ipynb?hl=en#scrollTo=BSgFCAcMbk1R

# VOC
for b, m in zip([64, 48, 32, 16], ['yolov5s', 'yolov5m', 'yolov5l', 'yolov5x']):  # zip(batch_size, model)
  !python train.py --batch {b} --weights {m}.pt --data voc.yaml --epochs 50 --cache --img 512 --nosave --hyp hyp.finetune.yaml --project VOC --name {m}

glenn-jocher commented 3 years ago

@positive666 BTW, you can see these VOC training logs here: https://wandb.ai/glenn-jocher/VOC

developer0hye commented 3 years ago

@positive666 @glenn-jocher

How about attention layer proposed in ECANet?

Someone already checked its performance with yolov3-tiny.

Look at this results.

positive666 commented 3 years ago

In general attention module, the improvement of baseline on YOLOV5's public data set is almost negligible. You can try it. There is indeed ECA code in my FORK warehouse, but I did not register it and tried it. I tried CBAM. And COORD, the latter may behave a little normal, but there is no improvement. My personal thinking here is that YOLOV5's backbone has been trained to have good generalization, and you can also train it yourself! good luck

------------------ 原始邮件 ------------------ 发件人: "Yonghye @.>; 发送时间: 2021年5月16日(星期天) 下午2:25 收件人: @.>; 抄送: @.>; @.>; 主题: Re: [ultralytics/yolov5] Add ASFF (three fuse feature layers) int the Head for V5(s,m,l,x) (#2348)

@positive666 @glenn-jocher

How about attention layer proposed in ECANet?

Someone already checked its performance with yolov3-tiny.

Look at this results.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

phunix9 commented 3 years ago

In general attention module, the improvement of baseline on YOLOV5's public data set is almost negligible. You can try it. There is indeed ECA code in my FORK warehouse, but I did not register it and tried it. I tried CBAM. And COORD, the latter may behave a little normal, but there is no improvement. My personal thinking here is that YOLOV5's backbone has been trained to have good generalization, and you can also train it yourself! good luck … ------------------ 原始邮件 ------------------ 发件人: "Yonghye @.>; 发送时间: 2021年5月16日(星期天) 下午2:25 收件人: @.>; 抄送: @.>; @.>; 主题: Re: [ultralytics/yolov5] Add ASFF (three fuse feature layers) int the Head for V5(s,m,l,x) (#2348) @positive666 @glenn-jocher How about attention layer proposed in ECANet? Someone already checked its performance with yolov3-tiny. Look at this results. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

@positive666 Hello, thank you for your contribution. I have a question after adding cbam layer acoording your code. When I start training, loss becomes Nan after several epochs(such as 10 or 100 epochs). However, when I use yolo5s.yaml without cbam layer, it can train successfully. I wonder if you know the reason. Thanks!

farajist commented 3 years ago

@phunix9 did you find a solution to NaN loss issue ?

ultralytics / yolov5