tinyvision / DAMO-YOLO

DAMO-YOLO: a fast and accurate object detection method with some new techs, including NAS backbones, efficient RepGFPN, ZeroHead, AlignedOTA, and distillation enhancement.
Apache License 2.0
3.78k stars 476 forks source link

I think the pictures in your paper are not rigorous in several places #91

Closed xingguang12 closed 1 year ago

xingguang12 commented 1 year ago

Search before asking

Description

After carefully reading the code of DAMO-YOLO, I think there are several problems with your picture. Snipaste_2023-03-12_20-09-14

  1. the number of fusion block is 5
  2. The shortcut operation on your diagram is not drawn Snipaste_2023-03-12_20-06-35 3.There should be a 1*1 convolutional layer after the final Concat operation to adjust the number of channels Snipaste_2023-03-12_20-02-09 4.The representation of xN on your picture is easy to mislead others

Use case

No response

Additional

No response

Are you willing to submit a PR?

xingguang12 commented 1 year ago

I drew a sketch, please check if there is any problem, please help me to take a closer look. image

xingguang12 commented 1 year ago

I have another question. Why do you want to change the activation function in RepConv from ReLU to SiLU? Is there any benefit? Please tell me the details, thank you very much

jyqi commented 1 year ago

I drew a sketch, please check if there is any problem, please help me to take a closer look. image

Sorry for the misunderstanding caused by the figure in Our paper. The reason why we remove the extra node is that it only accepts one input degree. we verified the accuracy of the model of 1) using Fusion Block for feature transformation and 2) directly assigned its parent node as the extra node, and found that the accuracy difference is small. Therefore, in order to speed up the model time, we adopted the second scheme, but the diagram has not been updated accordingly.

The sketch you drawn is correct and comprehensive, which provides some help on revising our figure, Thanks a lot.

jyqi commented 1 year ago

I have another question. Why do you want to change the activation function in RepConv from ReLU to SiLU? Is there any benefit? Please tell me the details, thank you very much

For the activation function in RepConv, we use ReLU in the Tiny and Small models and SiLU in the Medium model. The reason why we use SiLU on the M model is that it can bring larger accuracy gain than inference speed increases in relative proportion.

xingguang12 commented 1 year ago

I drew a sketch, please check if there is any problem, please help me to take a closer look. image

Sorry for the misunderstanding caused by the figure in Our paper. The reason why we remove the extra node is that it only accepts one input degree. we verified the accuracy of the model of 1) using Fusion Block for feature transformation and 2) directly assigned its parent node as the extra node, and found that the accuracy difference is small. Therefore, in order to speed up the model time, we adopted the second scheme, but the diagram has not been updated accordingly.

The sketch you drawn is correct and comprehensive, which provides some help on revising our figure, Thanks a lot.

Thank you for your recognition of my work

xingguang12 commented 1 year ago

I have another question. Why do you want to change the activation function in RepConv from ReLU to SiLU? Is there any benefit? Please tell me the details, thank you very much

For the activation function in RepConv, we use ReLU in the Tiny and Small models and SiLU in the Medium model. The reason why we use SiLU on the M model is that it can bring larger accuracy gain than inference speed increases in relative proportion.

thanks a lot, In a model of the same size, what is the impact on the accuracy of replacing ReLU with SiLU? (on your dataset).I'm trying to modify DAMO-YOLO recently, but I can't get a GPU on the computing platform these days and I haven't started the experiment yet. Do you have any good modification suggestions?

jyqi commented 1 year ago

I have another question. Why do you want to change the activation function in RepConv from ReLU to SiLU? Is there any benefit? Please tell me the details, thank you very much

For the activation function in RepConv, we use ReLU in the Tiny and Small models and SiLU in the Medium model. The reason why we use SiLU on the M model is that it can bring larger accuracy gain than inference speed increases in relative proportion.

thanks a lot, In a model of the same size, what is the impact on the accuracy of replacing ReLU with SiLU? (on your dataset).I'm trying to modify DAMO-YOLO recently, but I can't get a GPU on the computing platform these days and I haven't started the experiment yet. Do you have any good modification suggestions?

On Medium model, replace ReLU with SiLU can achieve mAP gain of around 1.0%. For the DAMO-YOLO modification, 1) you can customize your own backbone by our TinyNAS(https://github.com/alibaba/lightweight-neural-architecture-search/blob/main/scripts/damo-yolo/Tutorial_NAS_for_DAMO-YOLO_cn.md), 2) use Transformer structure, 3) replace zero-head with recently advanced head, 4) use dataset auto-augmentation and so on.