Closed Pedro-Leitek closed 2 years ago
👋 Hello @Pedro-Leitek, thank you for your interest in YOLOv5 🚀! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.
If this is a 🐛 Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.
If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available.
For business inquiries or professional support requests please visit https://ultralytics.com or email support@ultralytics.com.
Python>=3.7.0 with all requirements.txt installed including PyTorch>=1.7. To get started:
git clone https://github.com/ultralytics/yolov5 # clone
cd yolov5
pip install -r requirements.txt # install
YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):
If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), validation (val.py), inference (detect.py) and export (export.py) on macOS, Windows, and Ubuntu every 24 hours and on every commit.
There are various factors at play, such as number of classes, amount of images, and the Yolo size you are using for instance. But I do not think with the model you are training, if you have 2,5 days on a GTX 1660 now, that it is possible to have under an hour with a GPU from Azure. I am assuming here you have a (very) large dataset.
I think you should start with the Standard_NC6s_v3 with Tesla V100 GPU and see how it goes. You can probably crank up the batch size quite a bit with the 16GB of VRAM but if you increase it too much it will take longer for your model to converge. Then there is the new ND A100 v4 with A100 GPUs which are now in preview. You can easily sign up for that. I use those, in my case the model is finished in 8-12 hours . The Standard_NC6s_v3 with V100 is very servicable but has some issues like an outdated CPU.
I would try with the Standard_NC6s_v3 V100 GPU first, then see if you actually hit a wall with the batch size and request access to the new A100 VMs.
@Pedro-Leitek 👋 Hello! Thanks for asking about training speed issues. YOLOv5 🚀 can be trained on CPU (slowest), single-GPU, or multi-GPU (fastest). If you would like to increase your training speed some options are:
--batch-size
--img-size
--batch-size
python train.py --cache
(RAM caching) or --cache disk
(disk caching)Good luck 🍀 and let us know if you have any other questions!
1660 shouldn't be so slow, I can certainly do a good amount of training in a 3060 (which is faster but not by orders of magnitude), overnight at high resolution. I would suggest you make sure your CUDA drivers are up to date and reinstall your environment with the latest Pytorch version following the instructions on their website. You can get a cheap GPU at vast.ai or another similar service if you are not dealing with sensitive data or code, or you can do for free in Colab/Kaggle if you keep the tab active.
I find I get similar training times with a local 3060 to the GPU you get with the Pro (not Pro+) version of COLAB It suits me to let it run overnight or while I am doing something else
Good news is that with the crypto meltdown, the price of GPUs seems to be dropping
👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.
Access additional YOLOv5 🚀 resources:
Access additional Ultralytics ⚡ resources:
Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!
Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐!
@barney2074 absolutely, GPUs like the 3060 definitely offer a significant performance boost, and leveraging platforms such as Colab or Kaggle can also aid in accelerating the training process. Keeping an eye on GPU prices during market fluctuations is a practical approach.
Thank you for sharing your insights, and if you have any more questions or need further assistance, feel free to ask!
Search before asking
Question
Hi,
Weeks ago I trained yolov5 on a custom dataset. The thing is, it took 2 and a half days to train 100 epochs on a batch of 32 images with image size 160. I used my GTX 1660 super to train it. If I try to use a bigger image size the gpu gets a out of memory error. More epochs would extend the training time. So, is there any graphic card on the market that through Microsoft Azure, I can train my model under 1 hour (300 epochs, batch 64, image size 640)?
Thanks
Additional
No response