ultralytics / yolov5

YOLOv5 πŸš€ in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
50.21k stars 16.21k forks source link

on the Xavier agx not running #5625

Closed ozlem-atiz closed 2 years ago

ozlem-atiz commented 2 years ago

Search before asking

YOLOv5 Component

No response

Bug

I am trying to do training on jetson xavier agx. But I downloaded all the requirements. Setups are ok. When I start the tutorial, it gives a syntax error. I couldn't understand what the problem is. I created char.yaml file. The codes are as follows. I wrote the location of train and val according to its location. Can you help me please @glenn-jocher @PetrDvoracek @adrianholovaty @jacklinquan @ Screenshot from 2021-11-12 11-42-59

Environment

ubuntu 18.04 python:python3.9 computer: Nvidia Jetson Xavier AGX

Minimal Reproducible Example

char.yaml code

train: /data/images/train
val: /data/images/val

# number of classes
nc: 1

# class names
names: ['plate']

Additional

No response

Are you willing to submit a PR?

glenn-jocher commented 2 years ago

@ozlem-atiz your char.yaml file is all correct, there's no issues there. You likely have environment issues. We don't have Xavier devices ourselves, but in general I would recommend running in an Xavier compatible docker image like this one: https://ngc.nvidia.com/catalog/containers/nvidia:l4t-pytorch

ozlem-atiz commented 2 years ago

Hei @glenn-jocher . Δ° changed in char.yaml. And i have Illegal instruction (core dumped) error. Why i don t know. Δ° am used this command:export OPENBLAS_CORETYPE=ARMV8 i have same error. Why is this ? Screenshot from 2021-11-17 09-39-38 changed .yaml file: train: /home/omer/Desktop/yolov5/data/images/train val: /home/omer/Desktop/yolov5/data/images/val

number of classes

nc: 1

class names

names: ['plate']

glenn-jocher commented 2 years ago

@ozlem-atiz it appears you may have environment problems. Please ensure you meet all dependency requirements if you are attempting to run YOLOv5 locally. If in doubt, create a new virtual Python 3.8 environment, clone the latest repo (code changes daily), and pip install -r requirements.txt again. We also highly recommend using one of our verified environments below.

Requirements

Python>=3.6.0 with all requirements.txt installed including PyTorch>=1.7. To get started:

$ git clone https://github.com/ultralytics/yolov5
$ cd yolov5
$ pip install -r requirements.txt

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

CI CPU testing

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), validation (val.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu every 24 hours and on every commit.

ozlem-atiz commented 2 years ago

@ozlem-atiz it appears you may have environment problems. Please ensure you meet all dependency requirements if you are attempting to run YOLOv5 locally. If in doubt, create a new virtual Python 3.8 environment, clone the latest repo (code changes daily), and pip install -r requirements.txt again. We also highly recommend using one of our verified environments below.

Requirements

Python>=3.6.0 with all requirements.txt installed including PyTorch>=1.7. To get started:

$ git clone https://github.com/ultralytics/yolov5
$ cd yolov5
$ pip install -r requirements.txt

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

CI CPU testing

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), validation (val.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu every 24 hours and on every commit.

I changed the pytorch version. I installed all the requirements as you said. i am loaded again the yolov5 .Now I am getting another error. New memory related bug. I researched. I created a swapfile and allocated additional memory. But the error persists. I went according to this source. https://www.digitalocean.com/community/tutorials/how-to-add-swap-space-on-ubuntu-16-04

Screenshot from 2021-11-18 11-20-15

glenn-jocher commented 2 years ago

@ozlem-atiz interesting. A different user recently reported a similar error and said that reducing NUM_THREADS here resolved the issue for them. How many cpus does your device have? Can you try to reduce this and see if this resolves your issue?

https://github.com/ultralytics/yolov5/blob/562191f5756273aca54225903f5933f7683daade/utils/datasets.py#L38

glenn-jocher commented 2 years ago

@ozlem-atiz also that 'System throttled due to Over-current' message doesn't look very good. Maybe a hardware swap or upgrade is a good idea.

ozlem-atiz commented 2 years ago

Screenshot from 2021-11-18 14-23-33

Images and labels in the train folder are read, but they do not load in the epochs section. I changed the command line you mentioned from 8 to 4

NUM_THREADS = min ( 4 , os . cpu_count ()) # çok işlemli iş parçacığı sayısı @glenn-jocher

glenn-jocher commented 2 years ago

@ozlem-atiz does it work at 4? If not keep reducing.

ozlem-atiz commented 2 years ago

didn't work. I reduced it to 1 digits. always the same result: error:OSError: [Errno 12] Cannot allocate memory @glenn-jocher

glenn-jocher commented 2 years ago

@ozlem-atiz you need more RAM, your system is simply out of memory.

ozlem-atiz commented 2 years ago

f3bf9be6-107a-4efa-b6ba-b3795f4ad210

I am using jetson nano for gpu computer system. But there are not many projects in it. There is only this yolov5 project. How to upgrade ram. I guess I need to send it in for service. I don't know how to upgrade ram. I created a swapfile. But the system didn't eat it

glenn-jocher commented 2 years ago

@ozlem-atiz I think what's going on is that you're trying to use a jetson/xavier to train models, which I've never heard of. I think these are strictly deployment destinations for trained models. First you train in the cloud or with a desktop GPU, then you export and deploy.

ozlem-atiz commented 2 years ago

@glenn-jocher mr jocher. I'm actually training on colab. But I want my system to work very well. That's why I want it to train non-stop. I want to do this from the terminal on computers with gpu. While the system is running, I want it to continue to train itself morning and evening. I want colab to train on xavier for a long time because it gives limited time :D

SodrSnne commented 2 years ago

@glenn-jocher mr jocher. I'm actually training on colab. But I want my system to work very well. That's why I want it to train non-stop. I want to do this from the terminal on computers with gpu. While the system is running, I want it to continue to train itself morning and evening. I want colab to train on xavier for a long time because it gives limited time :D

i am a jetson player , i think it could only be a inference device, not suitable for training; you can try detece.pyon it.

github-actions[bot] commented 2 years ago

πŸ‘‹ Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.

Access additional YOLOv5 πŸš€ resources:

Access additional Ultralytics ⚑ resources:

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLOv5 πŸš€ and Vision AI ⭐!

zhangv12 commented 2 years ago

Hi, I also met the implementation problems on Jetson Nano with 4GB memory and also on Xavier. My problem is like this, the input source when inferencing comes from a web camera(http://xxx.xxx.xxxx), firstly the detection program goes well, but after few minutes (maybe fewer hours on Xavier), the video showing window getting black, and the program detects nothing. I guess this is caused by the memory problem, but still have no solution. Do you have any suggestion? Thank you!

glenn-jocher commented 2 years ago

@zhangv12 YOLOv5 uses cv2 for streaming and video sources, so I would raise an issue directly on the opencv repository regarding sustained streaming over time, as this is not within our direct control.

Also if you suspect your hardware to be the cause you should reproduce on other hardware, i.e. Colab to isolate the source of the problem.

Lastly note your source or your own network connection may be at fault.

github-actions[bot] commented 2 years ago

πŸ‘‹ Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.

Access additional YOLOv5 πŸš€ resources:

Access additional Ultralytics ⚑ resources:

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLOv5 πŸš€ and Vision AI ⭐!