ultralytics / yolov5

YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
50.76k stars 16.35k forks source link

CUDA error: device-side assert triggered #1263

Closed mcjqwer closed 4 years ago

mcjqwer commented 4 years ago

❔Question

hello! Today i train the yolo model based on the detection of head and i use the data of my classmate . However, i come across the error as the title describe. Some pictures in the dataset have a large amount of preson head range from 30 to 40, even some picture have 1000+ heads. So, i search the internet and i guess the error results from the data overflow, what do yo think?????
Forgive my poor English!!!!! this is the detail infromation.( Traceback (most recent call last): File "/gpfs/users_home/201961206025/yolov5-master/train.py", line 394, in train(hyp) File "/gpfs/users_home/201961206025/yolov5-master/train.py", line 299, in train dataloader=testloader) File "/gpfs/users_home/201961206025/yolov5-master/test.py", line 95, in test loss += compute_loss([x.float() for x in train_out], targets, model)[1][:3] # GIoU, obj, cls File "/gpfs/users_home/201961206025/yolov5-master/utils/utils.py", line 462, in compute_loss tobj[b, a, gj, gi] = (1.0 - model.gr) + model.gr * giou.detach().clamp(0).type(tobj.dtype) # giou ratio RuntimeError: CUDA error: device-side assert triggered )

Additional context

github-actions[bot] commented 4 years ago

Hello @mcjqwer, thank you for your interest in our work! Please visit our Custom Training Tutorial to get started, and see our Jupyter Notebook Open In Colab, Docker Image, and Google Cloud Quickstart Guide for example environments.

If this is a bug report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom model or data training question, please note Ultralytics does not provide free personal support. As a leader in vision ML and AI, we do offer professional consulting, from simple expert advice up to delivery of fully customized, end-to-end production solutions for our clients, such as:

For more information please visit https://www.ultralytics.com.

mcjqwer commented 4 years ago

What make me angry is the train code can run normally in my own computer(2060s)(CPU environment), it cannot run in the school High performance computer(Tesla V100 32G )(GPU env)

glenn-jocher commented 4 years ago

It appears you may have environment problems. Please ensure you meet all dependency requirements if you are attempting to run YOLOv5 locally. If in doubt, create a new virtual Python 3.8 environment, clone the latest repo (code changes daily), and pip install -r requirements.txt again. We also highly recommend using one of our verified environments below.

Requirements

Python 3.8 or later with all requirements.txt dependencies installed, including torch>=1.6. To install run:

$ pip install -r requirements.txt

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

CI CPU testing

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are passing. These tests evaluate proper operation of basic YOLOv5 functionality, including training (train.py), testing (test.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu.

dongjuns commented 4 years ago

@mcjqwer Use the docker image! It is simple and easy, even not environmental problems.

mcjqwer commented 4 years ago

It appears you may have environment problems. Please ensure you meet all dependency requirements if you are attempting to run YOLOv5 locally. If in doubt, create a new virtual Python 3.8 environment, clone the latest repo (code changes daily), and pip install -r requirements.txt again. We also highly recommend using one of our verified environments below.

Requirements

Python 3.8 or later with all requirements.txt dependencies installed, including torch>=1.6. To install run:

$ pip install -r requirements.txt

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

CI CPU testing

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are passing. These tests evaluate proper operation of basic YOLOv5 functionality, including training (train.py), testing (test.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu.

mcjqwer commented 4 years ago

It appears you may have environment problems. Please ensure you meet all dependency requirements if you are attempting to run YOLOv5 locally. If in doubt, create a new virtual Python 3.8 environment, clone the latest repo (code changes daily), and pip install -r requirements.txt again. We also highly recommend using one of our verified environments below.

Requirements

Python 3.8 or later with all requirements.txt dependencies installed, including torch>=1.6. To install run:

$ pip install -r requirements.txt

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

CI CPU testing

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are passing. These tests evaluate proper operation of basic YOLOv5 functionality, including training (train.py), testing (test.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu.

I changed a data set that I could train normally before, and found it still works normally now. Therefore, I think it is not an environmental problem, I just changed a data set. Could it be that there are a lot of head tags in my head picture causing the data overflow? Some pictures have hundreds or even thousands of heads.

glenn-jocher commented 4 years ago

@mcjqwer there is no limit on the number of labels per image. If you think there is a bug then please submit a bug report with code to reproduce. I'll add our default bug response here:

Please note that most technical problems are due to:

If none of these apply to you, we suggest you close this issue and raise a new one using the Bug Report template, providing screenshots and minimum viable code to reproduce your issue. Thank you!

Requirements

Python 3.8 or later with all requirements.txt dependencies installed, including torch>=1.6. To install run:

$ pip install -r requirements.txt

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

CI CPU testing

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are passing. These tests evaluate proper operation of basic YOLOv5 functionality, including training (train.py), testing (test.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu.