LeoSouquet commented 2 years ago

Search before asking

[X] I have searched the YOLOv5 issues and discussions and found no similar questions.

Question

Hi Guys,

I am working to implement Pruning technique inspired from the Model SlimYoloV3. (https://arxiv.org/abs/1907.11093. The objective is to reduce the number of kernels per layer, thus reducing the number of parameters.

However, in a preliminary experience, I took the Yolov3.yaml file and divided all kernels numbers by two. This brought down the number of parameters from: Original YoloV3 : Model Summary: 261 layers, 61508200 parameters, 0 gradients, 154.7 GFLOPs SlimYolv3 : Model Summary: 261 layers, 15394648 parameters, 0 gradients, 38.9 GFLOPs

I fine-tune the pruned model. (From Coco Weights) to bring my mAP back up and tested the inference speed.

However, the inference speed remains the same as the regular YoloV3.yaml. I don't understand at all as the number of parameters a GFLOPS have drastically been reduced.

Any idea why is. that. ??

Additional

Here is my pruned yolov3.yaml

`# parameters nc: 1 # number of classes depth_multiple: 1.0 # model depth multiple width_multiple: 1.0 # layer channel multiple

anchors

anchors:

[10,13, 16,30, 33,23] # P3/8
[30,61, 62,45, 59,119] # P4/16
[116,90, 156,198, 373,326] # P5/32

darknet53 backbone

backbone:

[from, number, module, args]

[[-1, 1, Conv, [16, 3, 1]], # 0 [-1, 1, Conv, [32, 3, 2]], # 1-P1/2 [-1, 1, Bottleneck, [32, False]], [-1, 1, Conv, [64, 3, 2]], # 3-P2/4 [-1, 2, Bottleneck, [64, False]], [-1, 1, Conv, [128, 3, 2]], # 5-P3/8 [-1, 8, Bottleneck, [128, False]], [-1, 1, Conv, [256, 3, 2]], # 7-P4/16 [-1, 8, Bottleneck, [256, False]], [-1, 1, Conv, [512, 3, 2]], # 9-P5/32 [-1, 4, Bottleneck, [512, False]], # 10 ]

YOLOv3 head

head: [[-1, 1, Bottleneck, [512, False]], [-1, 1, Conv, [256, [1, 1]]], [-1, 1, Conv, [512, 3, 1]], [-1, 1, Conv, [256, 1, 1]], [-1, 1, Conv, [512, 3, 1]], # 15 (P5/32-large)

[-2, 1, Conv, [128, 1, 1]], [-1, 1, nn.Upsample, [None, 2, 'nearest']], [[-1, 8], 1, Concat, [1]], # cat backbone P4 [-1, 1, Bottleneck, [256, False]], [-1, 1, Bottleneck, [256, False]], [-1, 1, Conv, [126, 1, 1]], [-1, 1, Conv, [256, 3, 1]], # 22 (P4/16-medium)

[-2, 1, Conv, [64, 1, 1]], [-1, 1, nn.Upsample, [None, 2, 'nearest']], [[-1, 6], 1, Concat, [1]], # cat backbone P3 [-1, 1, Bottleneck, [128, False]], [-1, 2, Bottleneck, [128, False]], # 27 (P3/8-small)

[[27, 22, 15], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5) ]`

glenn-jocher commented 2 years ago

@LeoCyclope I don't provide feedback for code customizations or user research, but in general reducing the YOLOv5 compound scaling constants naturally produces speed improvements which are quantified in our README results, i.e.:

LeoSouquet commented 2 years ago

Thanks for your reply 👍 I understand your point.

Quick question, I tried (with. a. regular. yoloV3 provided by you) to do an evaluation (using va.py): I get: A batch of 32: Speed: 0.1ms pre-process, 1.7ms inference, 3.9ms NMS per image at shape (32, 3, 416, 416) A batch of 1: Speed: 0.2ms pre-process, 8.9ms inference, 1.2ms NMS per image at shape (1, 3, 416, 416) This means it takes more than 5 times faster to infer a batch of 32 than a batch of 1. Any idea from your experience if those numbers make sense or not?

Thanks a lot in advance,

Léo

glenn-jocher commented 2 years ago

@LeoCyclope 👋 Hello! Thanks for asking about inference speed issues. YOLOv5 🚀 can be run on CPU (i.e. --device cpu, slow) or GPU if available (i.e. --device 0, faster). You can determine your inference device by viewing the YOLOv5 console output:

detect.py inference

python detect.py --weights yolov5s.pt --img 640 --conf 0.25 --source data/images/

YOLOv5 PyTorch Hub inference

import torch

# Model
model = torch.hub.load('ultralytics/yolov5', 'yolov5s')

# Images
dir = 'https://ultralytics.com/images/'
imgs = [dir + f for f in ('zidane.jpg', 'bus.jpg')]  # batch of images

# Inference
results = model(imgs)
results.print()  # or .show(), .save()
# Speed: 631.5ms pre-process, 19.2ms inference, 1.6ms NMS per image at shape (2, 3, 640, 640)

Increase Speeds

If you would like to increase your inference speed some options are:

Use batched inference with YOLOv5 PyTorch Hub
Reduce --img-size, i.e. 1280 -> 640 -> 320
Reduce model size, i.e. YOLOv5x -> YOLOv5l -> YOLOv5m -> YOLOv5s -> YOLOv5n
Use half precision FP16 inference with python detect.py --half and python val.py --half
Use a faster GPUs, i.e.: P100 -> V100 -> A100
Export to ONNX for up to 3x CPU speedup
Export to TensorRT for up to 5x GPU speedup
Use a free GPU backends with up to 16GB of CUDA memory:

Good luck 🍀 and let us know if you have any other questions!

github-actions[bot] commented 2 years ago

👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.

Access additional YOLOv5 🚀 resources:

Wiki – https://github.com/ultralytics/yolov5/wiki
Tutorials – https://docs.ultralytics.com/yolov5
Docs – https://docs.ultralytics.com

Access additional Ultralytics ⚡ resources:

Ultralytics HUB – https://ultralytics.com/hub
Vision API – https://ultralytics.com/yolov5
About Us – https://ultralytics.com/about
Join Our Team – https://ultralytics.com/work
Contact Us – https://ultralytics.com/contact

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐!

ultralytics / yolov5

Pruning : Reducing number of kernels, thus reducing the number of parameters DOES NOT improve inference speed! #6598