How can we speed up Yolov5??

VyasVedant commented 2 years ago

Search before asking

[X] I have searched the YOLOv5 issues and discussions and found no similar questions.

Question

Can anyone suggest how can I speed up program since I am using cpu and no gpu, it takes lot of time so are there any other module which can help code to speed up @glenn-jocher ???

Additional

No response

glenn-jocher commented 2 years ago

👋 Hello! Thanks for asking about inference speed issues. PyTorch Hub speeds will vary by hardware, software, model, inference settings, etc. Our default example in Colab with a V100 looks like this:

YOLOv5 🚀 can be run on CPU (i.e. --device cpu, slow) or GPU if available (i.e. --device 0, faster). You can determine your inference device by viewing the YOLOv5 console output:

detect.py inference

python detect.py --weights yolov5s.pt --img 640 --conf 0.25 --source data/images/

YOLOv5 PyTorch Hub inference

import torch

# Model
model = torch.hub.load('ultralytics/yolov5', 'yolov5s')

# Images
dir = 'https://ultralytics.com/images/'
imgs = [dir + f for f in ('zidane.jpg', 'bus.jpg')]  # batch of images

# Inference
results = model(imgs)
results.print()  # or .show(), .save()
# Speed: 631.5ms pre-process, 19.2ms inference, 1.6ms NMS per image at shape (2, 3, 640, 640)

Increase Speeds

If you would like to increase your inference speed some options are:

Use batched inference with YOLOv5 PyTorch Hub
Reduce --img-size, i.e. 1280 -> 640 -> 320
Reduce model size, i.e. YOLOv5x -> YOLOv5l -> YOLOv5m -> YOLOv5s -> YOLOv5n
Use half precision FP16 inference with python detect.py --half and python val.py --half
Use a faster GPUs, i.e.: P100 -> V100 -> A100
Export to ONNX or OpenVINO for up to 3x CPU speedup (CPU Benchmarks)
Export to TensorRT for up to 5x GPU speedup (GPU Benchmarks)
Use a free GPU backends with up to 16GB of CUDA memory:

Good luck 🍀 and let us know if you have any other questions!

dnth commented 2 years ago

@VyasVedant you can also consider using DeepSparse to significantly speedup the inference speed. I wrote a tutorial on my blog on how I got 180+ FPS inference on a CPU using only 4 cores.

Hope you'll find it useful 👇

https://dicksonneoh.com/portfolio/supercharging_yolov5_180_fps_cpu/

I also wrote a tutorial on OpenVINO 👇

https://dicksonneoh.com/portfolio/how_to_10x_your_od_model_and_deploy_50fps_cpu/

github-actions[bot] commented 2 years ago

👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.

Access additional YOLOv5 🚀 resources:

Wiki – https://github.com/ultralytics/yolov5/wiki
Tutorials – https://docs.ultralytics.com/yolov5
Docs – https://docs.ultralytics.com

Access additional Ultralytics ⚡ resources:

Ultralytics HUB – https://ultralytics.com/hub
Vision API – https://ultralytics.com/yolov5
About Us – https://ultralytics.com/about
Join Our Team – https://ultralytics.com/work
Contact Us – https://ultralytics.com/contact

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐!

marcortiz11 commented 2 years ago

I suggest you to take a look at sparseml. They provide tools to quantize and prune the yolov5 with minimal prediction degradation. After that, you can deploy your model and run it with the deepsparse engine. It's an engine they developed to run sparse models on CPU. This can make your yolov5 run pretty fast with speeds comparable to GPU's.

AbdelsalamHaa commented 1 year ago

You can speed up yolov5 2x on CPU by quantizing the model using onnxruntime. This will work for raspberrypi4 as well. You can check out this blog on how to achieve that.

https://medium.com/@abdelsalam.h.a.a/boosting-yolov5-performance-on-cpu-with-quantization-techniques-for-raspberry-pi4-too-dc2e24f68269

apanand14 commented 1 year ago

@VyasVedant you can also consider using DeepSparse to significantly speedup the inference speed. I wrote a tutorial on my blog on how I got 180+ FPS inference on a CPU using only 4 cores.

Hope you'll find it useful 👇

https://dicksonneoh.com/portfolio/supercharging_yolov5_180_fps_cpu/

I also wrote a tutorial on OpenVINO 👇

https://dicksonneoh.com/portfolio/how_to_10x_your_od_model_and_deploy_50fps_cpu/

@dnth I would like to deploy yolov8-seg model using deepsparse for the faster inference. Is it possible or it doesn't support yet? Thank you for your answer in advance.

glenn-jocher commented 1 year ago

Hello @apanand14,

While DeepSparse can significantly speed up inference on CPU, it is important to note that it is designed specifically for running sparse models. As of my knowledge, DeepSparse does not currently support the deployment of the yolov8-seg model.

If you are looking for ways to optimize inference speed with YOLOv5, I suggest considering some of the other techniques mentioned in this thread, such as using quantization techniques with onnxruntime or exploring the use of OpenVINO.

Thank you for your question and if you have any further issues or questions, feel free to ask.

Glenn

ultralytics / yolov5