Recommended GPU for training yolov5 under 1 hour

Pedro-Leitek commented 2 years ago

Search before asking

[X] I have searched the YOLOv5 issues and discussions and found no similar questions.

Question

Hi,

Weeks ago I trained yolov5 on a custom dataset. The thing is, it took 2 and a half days to train 100 epochs on a batch of 32 images with image size 160. I used my GTX 1660 super to train it. If I try to use a bigger image size the gpu gets a out of memory error. More epochs would extend the training time. So, is there any graphic card on the market that through Microsoft Azure, I can train my model under 1 hour (300 epochs, batch 64, image size 640)?

Thanks

Additional

No response

github-actions[bot] commented 2 years ago

👋 Hello @Pedro-Leitek, thank you for your interest in YOLOv5 🚀! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available.

For business inquiries or professional support requests please visit https://ultralytics.com or email support@ultralytics.com.

Requirements

Python>=3.7.0 with all requirements.txt installed including PyTorch>=1.7. To get started:

git clone https://github.com/ultralytics/yolov5  # clone
cd yolov5
pip install -r requirements.txt  # install

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Google Colab and Kaggle notebooks with free GPU:
Google Cloud Deep Learning VM. See GCP Quickstart Guide
Amazon Deep Learning AMI. See AWS Quickstart Guide
Docker Image. See Docker Quickstart Guide

Status

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), validation (val.py), inference (detect.py) and export (export.py) on macOS, Windows, and Ubuntu every 24 hours and on every commit.

Denizzje commented 2 years ago

There are various factors at play, such as number of classes, amount of images, and the Yolo size you are using for instance. But I do not think with the model you are training, if you have 2,5 days on a GTX 1660 now, that it is possible to have under an hour with a GPU from Azure. I am assuming here you have a (very) large dataset.

I think you should start with the Standard_NC6s_v3 with Tesla V100 GPU and see how it goes. You can probably crank up the batch size quite a bit with the 16GB of VRAM but if you increase it too much it will take longer for your model to converge. Then there is the new ND A100 v4 with A100 GPUs which are now in preview. You can easily sign up for that. I use those, in my case the model is finished in 8-12 hours . The Standard_NC6s_v3 with V100 is very servicable but has some issues like an outdated CPU.

I would try with the Standard_NC6s_v3 V100 GPU first, then see if you actually hit a wall with the batch size and request access to the new A100 VMs.

glenn-jocher commented 2 years ago

@Pedro-Leitek 👋 Hello! Thanks for asking about training speed issues. YOLOv5 🚀 can be trained on CPU (slowest), single-GPU, or multi-GPU (fastest). If you would like to increase your training speed some options are:

Increase --batch-size
Reduce --img-size
Reduce model size, i.e. from YOLOv5x -> YOLOv5l -> YOLOv5m -> YOLOv5s
Train with multi-GPU DDP at larger --batch-size
Train on cached data: python train.py --cache (RAM caching) or --cache disk (disk caching)
Train on faster GPUs, i.e.: P100 -> V100 -> A100
Train on free GPU backends with up to 16GB of CUDA memory:

Good luck 🍀 and let us know if you have any other questions!

guillermo-gabrielli-fer commented 2 years ago

1660 shouldn't be so slow, I can certainly do a good amount of training in a 3060 (which is faster but not by orders of magnitude), overnight at high resolution. I would suggest you make sure your CUDA drivers are up to date and reinstall your environment with the latest Pytorch version following the instructions on their website. You can get a cheap GPU at vast.ai or another similar service if you are not dealing with sensitive data or code, or you can do for free in Colab/Kaggle if you keep the tab active.

barney2074 commented 2 years ago

I find I get similar training times with a local 3060 to the GPU you get with the Pro (not Pro+) version of COLAB It suits me to let it run overnight or while I am doing something else

Good news is that with the crypto meltdown, the price of GPUs seems to be dropping

github-actions[bot] commented 2 years ago

👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.

Access additional YOLOv5 🚀 resources:

Wiki – https://github.com/ultralytics/yolov5/wiki
Tutorials – https://docs.ultralytics.com/yolov5
Docs – https://docs.ultralytics.com

Access additional Ultralytics ⚡ resources:

Ultralytics HUB – https://ultralytics.com/hub
Vision API – https://ultralytics.com/yolov5
About Us – https://ultralytics.com/about
Join Our Team – https://ultralytics.com/work
Contact Us – https://ultralytics.com/contact

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐!

glenn-jocher commented 11 months ago

@barney2074 absolutely, GPUs like the 3060 definitely offer a significant performance boost, and leveraging platforms such as Colab or Kaggle can also aid in accelerating the training process. Keeping an eye on GPU prices during market fluctuations is a practical approach.

Thank you for sharing your insights, and if you have any more questions or need further assistance, feel free to ask!

ultralytics / yolov5