Closed mengban closed 3 years ago
Hello @mengban, thank you for your interest in our work! Please visit our Custom Training Tutorial to get started, and see our Jupyter Notebook , Docker Image, and Google Cloud Quickstart Guide for example environments.
If this is a bug report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.
If this is a custom model or data training question, please note Ultralytics does not provide free personal support. As a leader in vision ML and AI, we do offer professional consulting, from simple expert advice up to delivery of fully customized, end-to-end production solutions for our clients, such as:
For more information please visit https://www.ultralytics.com.
@mengban GPU utilisation should be about 90% when running nvidia-smi
. You may have environment problems. I would recommend the Docker Image as an easy way to reproduce our environment while exploiting your hardware.
Please ensure you meet all dependency requirements if you are attempting to run YOLOv5 locally. If in doubt, create a new virtual Python 3.8 environment, clone the latest repo (code changes daily), and pip install -r requirements.txt
again. We also highly recommend using one of our verified environments below.
Python 3.8 or later with all requirements.txt dependencies installed, including torch>=1.6
. To install run:
$ pip install -r requirements.txt
YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):
If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are passing. These tests evaluate proper operation of basic YOLOv5 functionality, including training (train.py), testing (test.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu.
@mengban GPU utilisation should be about 90% when running
nvidia-smi
. You may have environment problems. I would recommend the Docker Image as an easy way to reproduce our environment while exploiting your hardware.Please ensure you meet all dependency requirements if you are attempting to run YOLOv5 locally. If in doubt, create a new virtual Python 3.8 environment, clone the latest repo (code changes daily), and
pip install -r requirements.txt
again. We also highly recommend using one of our verified environments below.Requirements
Python 3.8 or later with all requirements.txt dependencies installed, including
torch>=1.6
. To install run:$ pip install -r requirements.txt
Environments
YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):
- Google Colab Notebook with free GPU:
- Kaggle Notebook with free GPU: https://www.kaggle.com/models/ultralytics/yolov5
- Google Cloud Deep Learning VM. See GCP Quickstart Guide
- Docker Image https://hub.docker.com/r/ultralytics/yolov5. See Docker Quickstart Guide
Status
If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are passing. These tests evaluate proper operation of basic YOLOv5 functionality, including training (train.py), testing (test.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu.
Thanks for your reply. I re-install the package with pip install -r requirements.txt
, and my problem still exists.
And I find that the 8(num of workers)CPU works nearly 100%, so I think perhaps it's caused by my dataset. In my dataset, the image pixel is about 3000 4000 , even 6000 * 4000... and the number of box in the single image nearly 100+, so I think CPU can't feed data into GPU in time and then slow the whole training process. what do u think?
@mengban both CPU and GPU utilization should be 90-100%. 8 --workers is the default, you're free to vary as you see fit.
As I said try the docker image.
Docker usage link, https://docs.ultralytics.com/yolov5/environments/docker_image_quickstart_tutorial/
sudo docker run --ipc=host --gpus all -it -v "$(pwd)"/yourDirectory:/usr/src/yourDirectory ultralytics/yolov5:latest
replace 'yourDirectory' to your directory which you want to use in YOLOv5 docker container.
Docker usage link, https://docs.ultralytics.com/yolov5/environments/docker_image_quickstart_tutorial/
sudo docker run --ipc=host --gpus all -it -v "$(pwd)"/yourDirectory:/usr/src/yourDirectory ultralytics/yolov5:latest
replace 'yourDirectory' to your directory which you want to use in YOLOv5 docker container.
thanks, bro. I'll have a try.
+1, in the docker container, yolov5 directory placed on /usr/src/app
So where do you see your GPU-Util? I don't see it when training.
@SiyangXie Use the command in the terminal space
nvidia-smi
watch nvidia-smi
@SiyangXie @dongjuns yes the nvidia-smi
command is the best way to monitor GPU stats.
A new option for monitoring GPU utilization is also W&B logging, which plots your utilization, temperature, CUDA memory over your full training run. Here are stats for a COCO128 YOLOv5x training with a V100 on Colab Pro. We are putting togethor tutorials this week for our recent W&B integration.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
@mengban运行时 GPU 利用率应该在 90% 左右
nvidia-smi
。你可能有环境问题。我会推荐 Docker Image 作为一种简单的方法来重现我们的环境,同时利用你的硬件。 如果您尝试在本地运行 YOLOv5,请确保您满足所有依赖项要求。如果有疑问,请创建一个新的虚拟 Python 3.8 环境,克隆最新的 repo(代码每天更改),然后pip install -r requirements.txt
再次克隆。我们还强烈建议使用下面我们经过验证的环境之一。要求
**安装了所有requirements.txt依赖项的Python 3.8或更高版本,包括
torch>=1.6
**. 要安装运行:$ pip install -r requirements.txt
环境
YOLOv5 可以在以下任何经过验证的最新环境中运行(预装所有依赖项,包括CUDA / CUDNN、Python和PyTorch):
- **带有免费 GPU 的Google Colab 笔记本:**
- **带有免费 GPU 的Kaggle Notebook : **https ://www.kaggle.com/ultralytics/yolov5
- 谷歌云深度学习虚拟机。请参阅GCP 快速入门指南
- Docker 镜像 https://hub.docker.com/r/ultralytics/yolov5。请参阅[Docker 快速入门指南](https://docs.ultralytics.com/yolov5/environments/docker_image_quickstart_tutorial/)
地位
如果此标志为绿色,则所有YOLOv5 GitHub Actions持续集成 (CI) 测试均通过。这些测试评估基本 YOLOv5 功能的正确操作,包括MacOS、Windows 和 Ubuntu 上的训练 ( train.py )、测试 ( test.py )、推理 ( detect.py ) 和导出 ( export.py )。
感谢您的回复。我用 重新安装包
pip install -r requirements.txt
,我的问题仍然存在。 而且我发现 8(工人数量) CPU 几乎 100% 工作,所以我认为这可能是由我的数据集引起的。在我的数据集中,图像像素大约是 3000 4000 ,甚至是 6000 * 4000 ......并且单张图像中的框数接近 100+,所以我认为 CPU 无法及时将数据输入 GPU,然后减慢整个训练过程。你怎么看?
I have the same problem. Have you solved it?
@mengban运行时 GPU 利用率应该在 90% 左右
nvidia-smi
。你可能有环境问题。我会推荐 Docker Image 作为一种简单的方法来重现我们的环境,同时利用你的硬件。 如果您尝试在本地运行 YOLOv5,请确保您满足所有依赖项要求。如果有疑问,请创建一个新的虚拟 Python 3.8 环境,克隆最新的 repo(代码每天更改),然后pip install -r requirements.txt
再次克隆。我们还强烈建议使用下面我们经过验证的环境之一。要求
**安装了所有requirements.txt依赖项的Python 3.8或更高版本,包括
torch>=1.6
**. 要安装运行:$ pip install -r requirements.txt
环境
YOLOv5 可以在以下任何经过验证的最新环境中运行(预装所有依赖项,包括CUDA / CUDNN、Python和PyTorch):
- **带有免费 GPU 的Google Colab 笔记本:**
- **带有免费 GPU 的Kaggle Notebook : **https ://www.kaggle.com/ultralytics/yolov5
- 谷歌云深度学习虚拟机。请参阅GCP 快速入门指南
- Docker 镜像 https://hub.docker.com/r/ultralytics/yolov5。请参阅[Docker 快速入门指南](https://docs.ultralytics.com/yolov5/environments/docker_image_quickstart_tutorial/)
地位
如果此标志为绿色,则所有YOLOv5 GitHub Actions持续集成 (CI) 测试均通过。这些测试评估基本 YOLOv5 功能的正确操作,包括MacOS、Windows 和 Ubuntu 上的训练 ( train.py )、测试 ( test.py )、推理 ( detect.py ) 和导出 ( export.py )。
感谢您的回复。我用 重新安装包
pip install -r requirements.txt
,我的问题仍然存在。 而且我发现 8(工人数量) CPU 几乎 100% 工作,所以我认为这可能是由我的数据集引起的。在我的数据集中,图像像素大约是 3000 4000 ,甚至是 6000 * 4000 ......并且单张图像中的框数接近 100+,所以我认为 CPU 无法及时将数据输入 GPU,然后减慢整个训练过程。你怎么看?
I have the same problem. Have you solved it?
@mengban运行时 GPU 利用率应该在 90% 左右
nvidia-smi
。你可能有环境问题。我会推荐 Docker Image 作为一种简单的方法来重现我们的环境,同时利用你的硬件。 如果您尝试在本地运行 YOLOv5,请确保您满足所有依赖项要求。如果有疑问,请创建一个新的虚拟 Python 3.8 环境,克隆最新的 repo(代码每天更改),然后pip install -r requirements.txt
再次克隆。我们还强烈建议使用下面我们经过验证的环境之一。要求
**安装了所有requirements.txt依赖项的Python 3.8或更高版本,包括
torch>=1.6
**. 要安装运行:$ pip install -r requirements.txt
环境
YOLOv5 可以在以下任何经过验证的最新环境中运行(预装所有依赖项,包括CUDA / CUDNN、Python和PyTorch):
- **带有免费 GPU 的Google Colab 笔记本:**
- **带有免费 GPU 的Kaggle Notebook : **https ://www.kaggle.com/ultralytics/yolov5
- 谷歌云深度学习虚拟机。请参阅GCP 快速入门指南
- Docker 镜像 https://hub.docker.com/r/ultralytics/yolov5。请参阅[Docker 快速入门指南](https://docs.ultralytics.com/yolov5/environments/docker_image_quickstart_tutorial/)
地位
如果此标志为绿色,则所有YOLOv5 GitHub Actions持续集成 (CI) 测试均通过。这些测试评估基本 YOLOv5 功能的正确操作,包括MacOS、Windows 和 Ubuntu 上的训练 ( train.py )、测试 ( test.py )、推理 ( detect.py ) 和导出 ( export.py )。
感谢您的回复。我用 重新安装包
pip install -r requirements.txt
,我的问题仍然存在。 而且我发现 8(工人数量) CPU 几乎 100% 工作,所以我认为这可能是由我的数据集引起的。在我的数据集中,图像像素大约是 3000 4000 ,甚至是 6000 * 4000 ......并且单张图像中的框数接近 100+,所以我认为 CPU 无法及时将数据输入 GPU,然后减慢整个训练过程。你怎么看?
I have the same problem. Have you solved it?
❔Question
Traing very very very slowly, and the GPU-Util is always 0 by nvidia-smi, however the GPU Memory Usage about 20G+. Is this normal?
Additional context
Here is my env: yolov5 version :83deec Python : 3.8 CUDA : 10.1 cudnn: 7.6.3 PyTorch: 1.6.0 GPU: Tesla V100 32G Mem version. I train yolov5m with 20k+ images, the GPU usage always 0.