Applying to YOLO series networks

declanwucv commented 2 weeks ago

Hi, dear author. Thanks for your impressive work. After some research, I am sure this job is THE SOTA and about to try it on ultralytics YOLO series detectors.

However, I am a bit confused how to realized, can you give me some tips?

gaosh commented 2 weeks ago

do you want to apply the pruned model for YOLO or do you want to prune YOLO using our method?

declanwucv commented 2 weeks ago

Hi, @gaosh. Sorry for my confusing description. I want to prune YOLO using ATO.

I notice that there are some virtual_gate inserted into the Module. What is the principle of inserting them?

Beside the virtual gate, is there anything else I should take care of?

gaosh commented 2 weeks ago

For example, if you have a block with conv-->norm-->act-->conv-->norm-->act, if you want to prune the middle dimension, you need to insert the virtual_gate after the first act function, and it becomes conv-->norm-->act-->virtual_gate-->conv-->norm-->act. You also need to calculate the FLOPs of the corresponding layer, which is provided in our implementation and an example is this function https://github.com/xidongwu/AutoTrainOnce/blob/main/utils.py#L136

declanwucv commented 2 weeks ago

Thanks for your reply.

I notice that in both examples(mobilev2 and resnet), the virtual_gate was inserted "in" the block, rather than "between" the block, why?

gaosh commented 2 weeks ago

This depends on what dimensions you want to prune. If you want to prune the inner dimensions of a block, then you need to insert the gate in the block. If you want to prune the outer dimensions between blocks, you can insert the gate outside. However, note that the model dimensions between blocks are connected by residual connections, as a result, the outer dimensions may need to share the same width and pruning positions.

declanwucv commented 3 days ago

Hi, @gaosh Currently I am working on the project_wegit part, which use groupproximal algorithm. Unfortunately, I found it could be extremely complicated because of the multi-path connections in YOLO detectors.

If I get it right, the project_wegit is used for sparse training, so the hyper net could feel free to mask out some channels. If so, can I use other Norm technology like L1-norm?

gaosh commented 4 hours ago

Hello, the goal of project_wegit is to set weight groups corresponding to the mask to zero. You may use any additional projection or normalization techniques as needed. Also note that, even you use normalization based techniques, you should perform selective normalization, which means that the normalized weight groups should be decided by hypernet masks and only the selected weight groups are normalized.

An alternative way to avoid the complicated connections from YOLO is to avoid modifying the model dimension and only modify number of channels within blocks like C2f, C2fCIB, SPPF if we took YOLO-10 (https://github.com/THU-MIG/yolov10/blob/main/ultralytics/cfg/models/v10/yolov10n.yaml) as an example.

xidongwu / AutoTrainOnce

Applying to YOLO series networks #1