ultralytics / yolov5

YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
49.99k stars 16.16k forks source link

new builds don't run on jetson agx #4003

Closed tfpb closed 3 years ago

tfpb commented 3 years ago

Hello, you probably know that the jetson system aren't the best in terms of updates from nvidia.

This is the device, jetson agx with python 3.6.9 and tensorboard 2.5.0 tensorflow 2.5.0+nv21.6 torch 1.8.0 torchvision 0.9

I can run the yolo v5 version 5 from april (with gpu), but the current github doesn't start.

python3 train.py --img 640 --cfg yolov5s.yaml --hyp hyp.scratch.yaml --batch 32 --epochs 50 --data my.yaml --weights yolov5s.pt --name mymodel

Starting training for 100 epochs... Epoch gpu_mem box obj cls total labels img_size 0%| | 0/1184 [00:05<?, ?it/s] Traceback (most recent call last): File "train.py", line 660, in main(opt) File "train.py", line 558, in main train(opt.hyp, opt, device) File "train.py", line 345, in train scaler.scale(loss).backward() File "/home/jetsonagx1/.local/lib/python3.6/site-packages/torch/tensor.py", line 245, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File "/home/jetsonagx1/.local/lib/python3.6/site-packages/torch/autograd/init.py", line 147, in backward allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag RuntimeError: Unable to find a valid cuDNN algorithm to run convolution

The current build works on a windows computer, but it would be great if it could run on jetson devices. I tested pytorch 1.7,1,8 and 1,9, allway the same cuDNN message.

thanks regards

github-actions[bot] commented 3 years ago

👋 Hello @tfpb, thank you for your interest in 🚀 YOLOv5! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available.

For business inquiries or professional support requests please visit https://www.ultralytics.com or email Glenn Jocher at glenn.jocher@ultralytics.com.

Requirements

Python 3.8 or later with all requirements.txt dependencies installed, including torch>=1.7. To install run:

$ pip install -r requirements.txt

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

CI CPU testing

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), testing (val.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu every 24 hours and on every commit.

glenn-jocher commented 3 years ago

@tfpb for Jetson inference you may want to see Jetson submissions to the YOLOv5 Export Competition:

Compete and Win

We are super excited about our first-ever Ultralytics YOLOv5 🚀 EXPORT Competition with $10,000 in cash prizes!

tfpb commented 3 years ago

Hello Glenn, thanks for the links, I didn't knew the competition.

The thing with the bug is, that my models run on a pc without a problem, just on the jetson with python 3.6.9 the current version it's not working. While the april v5 release works fine.

I searched the error messages and found some topics talking about 2d conv... or did you dropped support for python 3.6, (the github bot say 3.8>=, your github pages 3.6>=)

If you have some ideas I would test it for you, I also would update python but it's glued into jetpack 4.5....

thanks :) best regards

glenn-jocher commented 3 years ago

Current master is compatible with python >= 3.6, but we need to update the bot message! Ill add a TODO for this.

glenn-jocher commented 3 years ago

TODO update python requirement repo-wide to 3.6

glenn-jocher commented 3 years ago

@tfpb ok, all tutorials, responses and READMEs are now updated to reflect our new relaxed python >= 3.6.0 requirements!

github-actions[bot] commented 3 years ago

👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.

Access additional YOLOv5 🚀 resources:

Access additional Ultralytics ⚡ resources:

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐!

leemengwei commented 3 years ago

Hello Glenn, thanks for the links, I didn't knew the competition.

The thing with the bug is, that my models run on a pc without a problem, just on the jetson with python 3.6.9 the current version it's not working. While the april v5 release works fine.

I searched the error messages and found some topics talking about 2d conv... or did you dropped support for python 3.6, (the github bot say 3.8>=, your github pages 3.6>=)

If you have some ideas I would test it for you, I also would update python but it's glued into jetpack 4.5....

thanks :) best regards @tfpb Hi, just simplely reduce your batch size should make it work!