CUDA error: device-side assert triggered

mcjqwer commented 4 years ago

❔Question

hello! Today i train the yolo model based on the detection of head and i use the data of my classmate . However, i come across the error as the title describe. Some pictures in the dataset have a large amount of preson head range from 30 to 40, even some picture have 1000+ heads. So, i search the internet and i guess the error results from the data overflow, what do yo think?????
Forgive my poor English!!!!! this is the detail infromation.( Traceback (most recent call last): File "/gpfs/users_home/201961206025/yolov5-master/train.py", line 394, in train(hyp) File "/gpfs/users_home/201961206025/yolov5-master/train.py", line 299, in train dataloader=testloader) File "/gpfs/users_home/201961206025/yolov5-master/test.py", line 95, in test loss += compute_loss([x.float() for x in train_out], targets, model)[1][:3] # GIoU, obj, cls File "/gpfs/users_home/201961206025/yolov5-master/utils/utils.py", line 462, in compute_loss tobj[b, a, gj, gi] = (1.0 - model.gr) + model.gr * giou.detach().clamp(0).type(tobj.dtype) # giou ratio RuntimeError: CUDA error: device-side assert triggered )

Additional context

github-actions[bot] commented 4 years ago

Hello @mcjqwer, thank you for your interest in our work! Please visit our Custom Training Tutorial to get started, and see our Jupyter Notebook , Docker Image, and Google Cloud Quickstart Guide for example environments.

If this is a bug report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom model or data training question, please note Ultralytics does not provide free personal support. As a leader in vision ML and AI, we do offer professional consulting, from simple expert advice up to delivery of fully customized, end-to-end production solutions for our clients, such as:

Cloud-based AI systems operating on hundreds of HD video streams in realtime.
Edge AI integrated into custom iOS and Android apps for realtime 30 FPS video inference.
Custom data training, hyperparameter evolution, and model exportation to any destination.

For more information please visit https://www.ultralytics.com.

mcjqwer commented 4 years ago

What make me angry is the train code can run normally in my own computer(2060s)(CPU environment), it cannot run in the school High performance computer(Tesla V100 32G )(GPU env)

glenn-jocher commented 4 years ago

It appears you may have environment problems. Please ensure you meet all dependency requirements if you are attempting to run YOLOv5 locally. If in doubt, create a new virtual Python 3.8 environment, clone the latest repo (code changes daily), and pip install -r requirements.txt again. We also highly recommend using one of our verified environments below.

Requirements

Python 3.8 or later with all requirements.txt dependencies installed, including torch>=1.6. To install run:

$ pip install -r requirements.txt

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Google Colab Notebook with free GPU:
Kaggle Notebook with free GPU: https://www.kaggle.com/models/ultralytics/yolov5
Google Cloud Deep Learning VM. See GCP Quickstart Guide
Docker Image https://hub.docker.com/r/ultralytics/yolov5. See Docker Quickstart Guide

Status

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are passing. These tests evaluate proper operation of basic YOLOv5 functionality, including training (train.py), testing (test.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu.

dongjuns commented 4 years ago

@mcjqwer Use the docker image! It is simple and easy, even not environmental problems.

mcjqwer commented 4 years ago

It appears you may have environment problems. Please ensure you meet all dependency requirements if you are attempting to run YOLOv5 locally. If in doubt, create a new virtual Python 3.8 environment, clone the latest repo (code changes daily), and pip install -r requirements.txt again. We also highly recommend using one of our verified environments below.

Requirements

Python 3.8 or later with all requirements.txt dependencies installed, including torch>=1.6. To install run:
$ pip install -r requirements.txt
Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Google Colab Notebook with free GPU:

Kaggle Notebook with free GPU: https://www.kaggle.com/models/ultralytics/yolov5

Google Cloud Deep Learning VM. See GCP Quickstart Guide

Docker Image https://hub.docker.com/r/ultralytics/yolov5. See Docker Quickstart Guide

Status

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are passing. These tests evaluate proper operation of basic YOLOv5 functionality, including training (train.py), testing (test.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu.

mcjqwer commented 4 years ago

It appears you may have environment problems. Please ensure you meet all dependency requirements if you are attempting to run YOLOv5 locally. If in doubt, create a new virtual Python 3.8 environment, clone the latest repo (code changes daily), and pip install -r requirements.txt again. We also highly recommend using one of our verified environments below.

Requirements

Python 3.8 or later with all requirements.txt dependencies installed, including torch>=1.6. To install run:
$ pip install -r requirements.txt
Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Google Colab Notebook with free GPU:

Kaggle Notebook with free GPU: https://www.kaggle.com/models/ultralytics/yolov5

Google Cloud Deep Learning VM. See GCP Quickstart Guide

Docker Image https://hub.docker.com/r/ultralytics/yolov5. See Docker Quickstart Guide

Status

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are passing. These tests evaluate proper operation of basic YOLOv5 functionality, including training (train.py), testing (test.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu.

I changed a data set that I could train normally before, and found it still works normally now. Therefore, I think it is not an environmental problem, I just changed a data set. Could it be that there are a lot of head tags in my head picture causing the data overflow? Some pictures have hundreds or even thousands of heads.

glenn-jocher commented 4 years ago

@mcjqwer there is no limit on the number of labels per image. If you think there is a bug then please submit a bug report with code to reproduce. I'll add our default bug response here:

Please note that most technical problems are due to:

Your changes to the default repository. If your issue is not reproducible in a new git clone version of this repository we can not debug it. Before going further run this code and ensure your issue persists:
```
sudo rm -rf yolov5  # remove existing
git clone https://github.com/ultralytics/yolov5 && cd yolov5 # clone latest
python detect.py  # verify detection
# CODE TO REPRODUCE YOUR ISSUE HERE
```
Your custom data. If your issue is not reproducible with COCO or COCO128 data we can not debug it. Visit our Custom Training Tutorial for guidelines on training your custom data. Examine train_batch0.jpg and test_batch0.jpg for a sanity check of training and testing data.
Your environment. If your issue is not reproducible in one of the verified environments below we can not debug it. If you are running YOLOv5 locally, ensure your environment meets all of the requirements.txt dependencies specified below.

If none of these apply to you, we suggest you close this issue and raise a new one using the Bug Report template, providing screenshots and minimum viable code to reproduce your issue. Thank you!

Requirements

Python 3.8 or later with all requirements.txt dependencies installed, including torch>=1.6. To install run:

$ pip install -r requirements.txt

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Google Colab Notebook with free GPU:
Kaggle Notebook with free GPU: https://www.kaggle.com/models/ultralytics/yolov5
Google Cloud Deep Learning VM. See GCP Quickstart Guide
Docker Image https://hub.docker.com/r/ultralytics/yolov5. See Docker Quickstart Guide

Status

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are passing. These tests evaluate proper operation of basic YOLOv5 functionality, including training (train.py), testing (test.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu.

ultralytics / yolov5