Cannot export model with cuda device on Jetson TX2

JNaranjo-Alcazar commented 2 years ago

Search before asking

[X] I have searched the YOLOv5 issues and found no similar bug report.

YOLOv5 Component

Export

Bug

Fusing layers... 
Model Summary: 213 layers, 7225885 parameters, 0 gradients

PyTorch: starting from yolov5s.pt (14.7 MB)
2021-11-04 11:55:46.365279: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.10.2

TensorFlow saved_model: starting export with tensorflow 2.5.0...

                 from  n    params  module                                  arguments                     

TensorFlow saved_model: export failure: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

TensorFlow Lite: starting export with tensorflow 2.5.0...

TensorFlow Lite: export failure: 'NoneType' object has no attribute 'call'

Environment

YOLOv5 \U0001f680 v6.0-23-ga18b0c3 torch 1.9.0 CUDA:0 (NVIDIA Tegra X2, 7850.375MB) OS: Ubuntu 18.04 on Jetson TX2 Python 3.6.9

Minimal Reproducible Example

python3 export.py --weights yolov5s.pt --include tflite --device 0

Additional

When converting models to tflite, the inference on the Jetson is slower. I thinks that is because I do not export with cuda device. When trying, I get the error pasted above

Are you willing to submit a PR?

[ ] Yes I'd like to help by submitting a PR!

glenn-jocher commented 2 years ago

@JNaranjo-Alcazar TFLite export should be done on CPU:

!python export.py --weights yolov5s.pt --include tflite
!python export.py --weights yolov5s.pt --include tflite --int8

TFlite models are intended for Android and EdgeTPU backends, they can not exploit CUDA devices and will be slower on CPU than simple PyTorch models.

JNaranjo-Alcazar commented 2 years ago

Thanks for the quick reply @glenn-jocher. Just to make it clear, the fastest inference on GPU (Jetson GPU) is using the pb model? It does not make sense to run a tflite model inference on Jetson (using the GPU)?

glenn-jocher commented 2 years ago

@JNaranjo-Alcazar well I've never used Jetson myself, but I don't believe TFLite has CUDA capability, or perhaps I'm just not aware of it.

In general the simplest CUDA inference will be with PyTorch, and the fastest is likely TensorRT.

github-actions[bot] commented 2 years ago

👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.

Access additional YOLOv5 🚀 resources:

Wiki – https://github.com/ultralytics/yolov5/wiki
Tutorials – https://docs.ultralytics.com/yolov5
Docs – https://docs.ultralytics.com

Access additional Ultralytics ⚡ resources:

Ultralytics HUB – https://ultralytics.com/hub
Vision API – https://ultralytics.com/yolov5
About Us – https://ultralytics.com/about
Join Our Team – https://ultralytics.com/work
Contact Us – https://ultralytics.com/contact

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐!

ultralytics / yolov5