ultralytics / yolov5

YOLOv5 ๐Ÿš€ in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
50.38k stars 16.26k forks source link

about ghost module #3234

Closed Henry0528 closed 3 years ago

Henry0528 commented 3 years ago

โ”Question

l've recently changed the yolov5s models with ghost modules.I used ghostconv to replace conv and replaced the bottleneck in C3 with ghostbottleneck. I train with the new model successfully and get only a very small drop in map,while the new model costs about 8 GFlops which is half of the yolov5s model(16GFlops).But when l run the test.py and find that the time consume of testing is almost the same as the yolov5s model,sometime even longer.

github-actions[bot] commented 3 years ago

๐Ÿ‘‹ Hello @Henry0528, thank you for your interest in ๐Ÿš€ YOLOv5! Please visit our โญ๏ธ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a ๐Ÿ› Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom training โ“ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available.

For business inquiries or professional support requests please visit https://www.ultralytics.com or email Glenn Jocher at glenn.jocher@ultralytics.com.

Requirements

Python 3.8 or later with all requirements.txt dependencies installed, including torch>=1.7. To install run:

$ pip install -r requirements.txt

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

CI CPU testing

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), testing (test.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu every 24 hours and on every commit.

glenn-jocher commented 3 years ago

@Henry0528 oh interesting! A table with the quantitative results would be cool.

Yes as you've seen GFLOPS only loosely correlates to speed sometimes. Correlation will vary by platform and backend, drivers etc.

Henry0528 commented 3 years ago

@glenn-jocher Thank you for your comment.I've made a table about my results.The test speed(only inference,not including the time of nms) on GPU is almost the same,while testing on CPUhas some difference.l guess maybe ghost module is not well supported on GPU.l also find that the time of nms is different on CPU and GPU and doing nms on GPU is even more time-cosuming. image

glenn-jocher commented 3 years ago

@Henry0528 hmm, your updates seem to work well!

The params are halved, which would help reduce package sizes for iOS/Android apps etc without losing any speed or accuracy. The FLOPS are halved also, though the speed impact doesn't quite reflect that same reduction. CPU shows a ~10% improvement.

I think the next step would be to test on larger models, i.e. probably by training a YOLOv5l6 model on COCO for 300 epochs to verify the improvements translate to larger models. If you want to share your yaml I could get a run started on one of our machines.

Henry0528 commented 3 years ago

@glenn-jocher I'm a college student doing my graduation project which using yolov5 to do some detection in underwater sonar picture.l'm a freshman to yolo so I'm not sure whether my change is reasonable.I would be grateful if you could do some futher test on my work. models.zip

glenn-jocher commented 3 years ago

@Henry0528 thanks, I'll take a look!

glenn-jocher commented 3 years ago

@Henry0528 I get the following stats comparison below for YOLOv5s (baseline) and your changes (YOLOv5s-ghost). These numbers don't line up with your comment above https://github.com/ultralytics/yolov5/issues/3234#issuecomment-844614891 though. Can you double check the model you sent and your table values?

python models/yolo.py --cfg yolov5s.yaml
Model Summary: 283 layers, 7276605 parameters, 7276605 gradients, 17.1 GFLOPS

python models/yolo.py --cfg yolov5s-ghost.yaml
Model Summary: 323 layers, 5870101 parameters, 5870101 gradients, 14.1 GFLOPS
Henry0528 commented 3 years ago

@glenn-jocher you should also change common.py like this and you may get the same result as me 2d007eeec6a54b9.png 10bf5552d2ee77cb.png

glenn-jocher commented 3 years ago

@Henry0528 thanks! After your change my numbers line up with your table. Ok I'll try to train a YOLOv5s and YOLOv5m versions of the model to compare to the baselines.

python models/yolo.py --cfg yolov5s-ghost.yaml
Model Summary: 479 layers, 3706861 parameters, 3706861 gradients, 8.2 GFLOPS
15050188022 commented 3 years ago

@glenn-jocher I'm a college student doing my graduation project which using yolov5 to do some detection in underwater sonar picture.l'm a freshman to yolo so I'm not sure whether my change is reasonable.I would be grateful if you could do some futher test on my work. models.zip

I have tried ghost module just as you did in my custom dataset,but the mAP drops a lot .any idea to disscuss?

Henry0528 commented 3 years ago

@15050188022 ็•™ไธช่”็ณปๆ–นๅผ่Š่Šๅ‘—

15050188022 commented 3 years ago

@glenn-jocher I'm a college student doing my graduation project which using yolov5 to do some detection in underwater sonar picture.l'm a freshman to yolo so I'm not sure whether my change is reasonable.I would be grateful if you could do some futher test on my work. models.zip

I have tried ghost module just as you did in my custom dataset,but the mAP drops a lot .any idea to disscuss?

@15050188022 ็•™ไธช่”็ณปๆ–นๅผ่Š่Šๅ‘— QQ 1570053804

glenn-jocher commented 3 years ago

@15050188022 @Henry0528 I started some YOLOv5s Ghost trainings in this public W&B project: https://wandb.ai/glenn-jocher/ghost

I've got two ghost trainings, and I'll add the baseline YOLOv5s model soon.

Henry0528 commented 3 years ago

@glenn-jocher I've saw the training results,maybe the drop in mAP is acceptable๏ผŸ

glenn-jocher commented 3 years ago

@Henry0528 @15050188022 our Ghost study is finished in https://wandb.ai/glenn-jocher/ghost. I trained YOLOv5s default (master branch) and yolov5s.yaml, yolov5s-ghost.yaml with the L134 change mentioned in https://github.com/ultralytics/yolov5/issues/3234#issuecomment-846032730 in ghost branch. Training results are below. So the accuracy is a bit less, though the params and flops are significantly reduced by almost half, so it seems to be a worthwhile compromise for some situations, like mobile app deployments that require small package sizes. Training memory and speed was about the same between all 3.

Screenshot 2021-05-28 at 11 33 44
Henry0528 commented 3 years ago

@glenn-jocher could you also check the test time of each model๏ผŸthough the flops are halved but the time cost has little difference on my own dataset

glenn-jocher commented 3 years ago

@Henry0528 @15050188022 YOLOv5s-ghost comparison here using

python test.py --weights yolov5s-ghost2.pt --data coco.yaml --img 640 --iou 0.65 
YOLOv5 ๐Ÿš€ v5.0-116-gbb13123 torch 1.8.1+cu101 CUDA:0 (Tesla T4, 15109.75MB)
Model size
(pixels)
mAPval
0.5:0.95
mAPtest
0.5:0.95
mAPval
0.5
Speed
T4 (ms)
params
(M)
FLOPS
640 (B)
YOLOv5s 640 37.0 - 56.4 4.7 7.3 17.0
YOLOv5s-ghost1 640 35.2 - 54.0 4.8 5.1 11.2
YOLOv5s-ghost2 640 35.6 - 54.1 4.9 3.9 8.8
Henry0528 commented 3 years ago

@glenn-jocher the results are strange.with lower params and flops,the time cost is longer

glenn-jocher commented 3 years ago

@Henry0528 no it's not strange. What we see in the results is mainly the effect of depthwise separable convolutions, mainly popularized by efficientnet/efficientdet architectures, which show the same lower params/FLOPS but slower inference trend when compared to full-convolutional (no groups) YOLO and ResNet architectures.

I think the results are interesting though, like I said everyone has different priorities, so for organizations prioritizing smaller packages these results may appeal.

Henry0528 commented 3 years ago

@glenn-jocher thank you very much for your answer,Now my problem is solved

glenn-jocher commented 3 years ago

@Henry0528 yeah no problem! Of course keep in mind that the results will vary by backend, so CPU inference, ONNX, android, iOS inference, etc. may see different levels of improvement than CUDA.

glenn-jocher commented 3 years ago

@Henry0528 I was thinking about this some more. Most of the model parameters are in the P5 and P6 layers at the largest strides.

Perhaps we could employ ghost modules only in the largest output layer, thereby achieving the most size reduction while minimally impacting the accuracy.

The ghost modules applied to P3/8 modules in particular are achieving next to no size reduction since those parameters are already so few. For example we could apply ghost modules to stages 7, 8, 9 in the backbone and 23 in the head (last output layer). This might reduce our model size by maybe 20-30% with (hopefully) little to no accuracy/speed hits.

                 from  n    params  module                                  arguments                     
  0                -1  1      3520  models.common.Focus                     [3, 32, 3]                    
  1                -1  1     18560  models.common.Conv                      [32, 64, 3, 2]                
  2                -1  1     18816  models.common.C3                        [64, 64, 1]                   
  3                -1  1     73984  models.common.Conv                      [64, 128, 3, 2]               
  4                -1  1    156928  models.common.C3                        [128, 128, 3]                 
  5                -1  1    295424  models.common.Conv                      [128, 256, 3, 2]              
  6                -1  1    625152  models.common.C3                        [256, 256, 3]                 
  7                -1  1   1180672  models.common.Conv                      [256, 512, 3, 2]              
  8                -1  1    656896  models.common.SPP                       [512, 512, [5, 9, 13]]        
  9                -1  1   1182720  models.common.C3                        [512, 512, 1, False]          
 10                -1  1    131584  models.common.Conv                      [512, 256, 1, 1]              
 11                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 12           [-1, 6]  1         0  models.common.Concat                    [1]                           
 13                -1  1    361984  models.common.C3                        [512, 256, 1, False]          
 14                -1  1     33024  models.common.Conv                      [256, 128, 1, 1]              
 15                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 16           [-1, 4]  1         0  models.common.Concat                    [1]                           
 17                -1  1     90880  models.common.C3                        [256, 128, 1, False]          
 18                -1  1    147712  models.common.Conv                      [128, 128, 3, 2]              
 19          [-1, 14]  1         0  models.common.Concat                    [1]                           
 20                -1  1    296448  models.common.C3                        [256, 256, 1, False]          
 21                -1  1    590336  models.common.Conv                      [256, 256, 3, 2]              
 22          [-1, 10]  1         0  models.common.Concat                    [1]                           
 23                -1  1   1182720  models.common.C3                        [512, 512, 1, False]          
 24      [17, 20, 23]  1    229245  Detect                                  [80, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [128, 256, 512]]
Model Summary: 283 layers, 7276605 parameters, 7276605 gradients, 17.1 GFLOPs
jaskiratsingh2000 commented 3 years ago

@glenn-jocher How can I access the YAML file for the YOLOv5-ghost? I was not able to find that on this repo. Please let me know Thanks!

glenn-jocher commented 3 years ago

@jaskiratsingh2000 yamls are in https://github.com/ultralytics/yolov5/issues/3234#issuecomment-845140740 zip

jaskiratsingh2000 commented 3 years ago

Okay I got it. I just tried running the yolov5s-ghost and as you mentioned in previous issue comment that it is one of the smallest yolo version you have. I just checked it in Rpi and it showed the total time taken is 3986.6 ms which is one the highest out of yolov5, yolov5-tiny. So if that is the smallest version how it is taking more time? @glenn-jocher

glenn-jocher commented 3 years ago

@jaskiratsingh2000 profiling results are indicated in https://github.com/ultralytics/yolov5/issues/3234#issuecomment-850312297

jaskiratsingh2000 commented 3 years ago

@glenn-jocher even there as well it is showing that more time taken than normal yolov5 and yolov5-tiny. Can you check once?

glenn-jocher commented 3 years ago

@jaskiratsingh2000 I created the metrics in https://github.com/ultralytics/yolov5/issues/3234#issuecomment-850312297, I don't need to check anything, they are correct

jaskiratsingh2000 commented 3 years ago

@glenn-jocher I am not kinda lying but these are the results I am getting whenever I am trying to run the yolov5s-ghost.yaml

YOLOv5 ๐Ÿš€ v5.0-110-gae04192 torch 1.7.0a0+e85d494 CPU

                 from  n    params  module                                  arguments                     
  0                -1  1      3520  models.common.Focus                     [3, 32, 3]                    
  1                -1  1     10144  models.experimental.GhostConv           [32, 64, 3, 2]                
  2                -1  1      9656  models.common.C3                        [64, 64, 1]                   
  3                -1  1     38720  models.experimental.GhostConv           [64, 128, 3, 2]               
  4                -1  1     43600  models.common.C3                        [128, 128, 3]                 
  5                -1  1    151168  models.experimental.GhostConv           [128, 256, 3, 2]              
  6                -1  1    165024  models.common.C3                        [256, 256, 3]                 
  7                -1  1    597248  models.experimental.GhostConv           [256, 512, 3, 2]              
  8                -1  1    656896  models.common.SPP                       [512, 512, [5, 9, 13]]        
  9                -1  1    564672  models.common.C3                        [512, 512, 1, False]          
 10                -1  1     69248  models.experimental.GhostConv           [512, 256, 1, 1]              
 11                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 12           [-1, 6]  1         0  models.common.Concat                    [1]                           
 13                -1  1    208608  models.common.C3                        [512, 256, 1, False]          
 14                -1  1     18240  models.experimental.GhostConv           [256, 128, 1, 1]              
 15                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 16           [-1, 4]  1         0  models.common.Concat                    [1]                           
 17                -1  1     53104  models.common.C3                        [256, 128, 1, False]          
 18                -1  1     75584  models.experimental.GhostConv           [128, 128, 3, 2]              
 19          [-1, 14]  1         0  models.common.Concat                    [1]                           
 20                -1  1    143072  models.common.C3                        [256, 256, 1, False]          
 21                -1  1    298624  models.experimental.GhostConv           [256, 256, 3, 2]              
 22          [-1, 10]  1         0  models.common.Concat                    [1]                           
 23                -1  1    564672  models.common.C3                        [512, 512, 1, False]          
 24      [17, 20, 23]  1     35061  Detect                                  [8, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [128, 256, 512]]
Model Summary: 479 layers, 3706861 parameters, 3706861 gradients

 time (ms)     GFLOPS     params  module
    118.30       0.00       3520  models.common.Focus
    250.34       0.00      10144  models.experimental.GhostConv
    274.37       0.00       9656  models.common.C3
    279.72       0.00      38720  models.experimental.GhostConv
    820.88       0.00      43600  models.common.C3
    314.01       0.00     151168  models.experimental.GhostConv
   1421.55       0.00     165024  models.common.C3
    338.42       0.00     597248  models.experimental.GhostConv
    129.29       0.00     656896  models.common.SPP
    708.59       0.00     564672  models.common.C3
    304.88       0.00      69248  models.experimental.GhostConv
      1.11       0.00          0  torch.nn.modules.upsampling.Upsample
      3.00       0.00          0  models.common.Concat
    532.37       0.00     208608  models.common.C3
    283.08       0.00      18240  models.experimental.GhostConv
      1.33       0.00          0  torch.nn.modules.upsampling.Upsample
      3.89       0.00          0  models.common.Concat
    484.57       0.00      53104  models.common.C3
    285.69       0.00      75584  models.experimental.GhostConv
      0.66       0.00          0  models.common.Concat
    619.82       0.00     143072  models.common.C3
    303.81       0.00     298624  models.experimental.GhostConv
      0.59       0.00          0  models.common.Concat
    716.95       0.00     564672  models.common.C3
     36.44       0.00      35061  Detect
8233.7ms total

That is why I am saying it repeatedly. @glenn-jocher please take a look or let me know if I am just missing out anything please.

jaskiratsingh2000 commented 3 years ago

If the yolov5s-ghost is the smallest version then it should give the lesser time out of all these but here it is giving more @glenn-jocher

jaskiratsingh2000 commented 3 years ago

I have also changed the line in common.py which was mentioned above but still the results are same for me. @glenn-jocher please let me know on this. I am kinda stuck now on this.

github-actions[bot] commented 3 years ago

๐Ÿ‘‹ Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.

Access additional YOLOv5 ๐Ÿš€ resources:

Access additional Ultralytics โšก resources:

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLOv5 ๐Ÿš€ and Vision AI โญ!

glenn-jocher commented 3 years ago

Good news ๐Ÿ˜ƒ! Your original issue may now be fixed โœ… in PR #4412. This PR adds a new yolov5s-ghost.yaml file to data/hub models to allow anyone to get started training Ghost models. You can start training this with:

python train.py --cfg yolov5s-ghost.yaml --weights yolov5s.pt

To receive this update:

Thank you for spotting this issue and informing us of the problem. Please let us know if this update resolves the issue for you, and feel free to inform us of any other issues you discover or feature requests that come to mind. Happy trainings with YOLOv5 ๐Ÿš€!