xidongwu / AutoTrainOnce

9 stars 0 forks source link

Applying to YOLO series networks #1

Open declanwucv opened 1 month ago

declanwucv commented 1 month ago

Hi, dear author. Thanks for your impressive work. After some research, I am sure this job is THE SOTA and about to try it on ultralytics YOLO series detectors.

However, I am a bit confused how to realized, can you give me some tips?

gaosh commented 1 month ago

do you want to apply the pruned model for YOLO or do you want to prune YOLO using our method?

declanwucv commented 1 month ago

Hi, @gaosh. Sorry for my confusing description. I want to prune YOLO using ATO.

I notice that there are some virtual_gate inserted into the Module. What is the principle of inserting them?

Beside the virtual gate, is there anything else I should take care of?

gaosh commented 1 month ago

For example, if you have a block with conv-->norm-->act-->conv-->norm-->act, if you want to prune the middle dimension, you need to insert the virtual_gate after the first act function, and it becomes conv-->norm-->act-->virtual_gate-->conv-->norm-->act. You also need to calculate the FLOPs of the corresponding layer, which is provided in our implementation and an example is this function https://github.com/xidongwu/AutoTrainOnce/blob/main/utils.py#L136

declanwucv commented 1 month ago

Thanks for your reply.

I notice that in both examples(mobilev2 and resnet), the virtual_gate was inserted "in" the block, rather than "between" the block, why?

gaosh commented 4 weeks ago

This depends on what dimensions you want to prune. If you want to prune the inner dimensions of a block, then you need to insert the gate in the block. If you want to prune the outer dimensions between blocks, you can insert the gate outside. However, note that the model dimensions between blocks are connected by residual connections, as a result, the outer dimensions may need to share the same width and pruning positions.

declanwucv commented 2 weeks ago

Hi, @gaosh Currently I am working on the project_wegit part, which use groupproximal algorithm. Unfortunately, I found it could be extremely complicated because of the multi-path connections in YOLO detectors.

If I get it right, the project_wegit is used for sparse training, so the hyper net could feel free to mask out some channels. If so, can I use other Norm technology like L1-norm?

gaosh commented 2 weeks ago

Hello, the goal of project_wegit is to set weight groups corresponding to the mask to zero. You may use any additional projection or normalization techniques as needed. Also note that, even you use normalization based techniques, you should perform selective normalization, which means that the normalized weight groups should be decided by hypernet masks and only the selected weight groups are normalized.

An alternative way to avoid the complicated connections from YOLO is to avoid modifying the model dimension and only modify number of channels within blocks like C2f, C2fCIB, SPPF if we took YOLO-10 (https://github.com/THU-MIG/yolov10/blob/main/ultralytics/cfg/models/v10/yolov10n.yaml) as an example.

Qicaoji7 commented 1 week ago

@declanwucv Hey! 😊 I'm also working on pruning the YOLO series and would love to check out your current work! Excited to connect!

declanwucv commented 1 week ago

hi @Qicaoji7 Currently I was unable to fully apply ato on ultralytics yolo. Here was what I have done yet:

  1. use depgraph or mircosoft' oto tool to analysis the dependence layers of yolo, which would give you some groups containing conv layers that should be pruned simultaneously;
  2. insert gate-functions in those groups;
  3. no need to change hypernet;
  4. need implement of params calculate function base of the hyper net output; Hope above would help. Thought I just use the gate-function of ato, the pruned model still show fairly good performance, surpass L2 pruning, L1/BN pruning with sparse training.