yolo not use GPU while converting (exporting) model to `tflite`

a-sajjad72 commented 1 month ago

Search before asking

[X] I have searched the YOLOv8 issues and found no similar bug report.

YOLOv8 Component

Export

Bug

The yolo using tensorflow 2.15.0 version for converting (exporting) a model to tflite which is out of support and can't use GPU. Due to which I takes lot of time for exporting the model sometime my free runtime runs out. I don't know how to specify the TF package to be used. Currenlty only TF 2.17 rc1 is supporting GPU. Have a look here if you need anymore info please let me know. Thank you

Environment

Environment: Google Colab Notebook

Output of the export command

Ultralytics YOLOv8.2.61 🚀 Python-3.10.12 torch-2.3.1+cu121 CPU (Intel Xeon 2.20GHz)
Model summary (fused): 218 layers, 25,843,234 parameters, 0 gradients, 78.7 GFLOPs

PyTorch: starting from '/content/runs/detect/yolov8m_16b_1024imgsz_100e/weights/best.pt' with input shape (1, 3, 320, 320) BCHW and output shape(s) (1, 10, 2100) (49.7 MB)
requirements: Ultralytics requirements ['sng4onnx>=1.0.1', 'onnx_graphsurgeon>=0.3.26', 'onnx>=1.12.0', 'onnx2tf>1.17.5,<=1.22.3', 'onnxslim>=0.1.31', 'tflite_support', 'onnxruntime'] not found, attempting AutoUpdate...
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com/
Collecting sng4onnx>=1.0.1
  Downloading sng4onnx-1.0.4-py3-none-any.whl (5.9 kB)
Collecting onnx_graphsurgeon>=0.3.26
  Downloading onnx_graphsurgeon-0.5.2-py2.py3-none-any.whl (56 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 56.4/56.4 kB 3.6 MB/s eta 0:00:00
Collecting onnx>=1.12.0
  Downloading onnx-1.16.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (15.9 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 15.9/15.9 MB 86.2 MB/s eta 0:00:00
Collecting onnx2tf<=1.22.3,>1.17.5
  Downloading onnx2tf-1.22.3-py3-none-any.whl (435 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 435.0/435.0 kB 214.6 MB/s eta 0:00:00
Collecting onnxslim>=0.1.31
  Downloading onnxslim-0.1.32-py3-none-any.whl (130 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 130.5/130.5 kB 221.0 MB/s eta 0:00:00
Collecting tflite_support
  Downloading tflite_support-0.4.4-cp310-cp310-manylinux2014_x86_64.whl (60.8 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 60.8/60.8 MB 104.0 MB/s eta 0:00:00
Collecting onnxruntime
  Downloading onnxruntime-1.18.1-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (6.8 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.8/6.8 MB 139.8 MB/s eta 0:00:00
Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (from onnx_graphsurgeon>=0.3.26) (1.25.2)
Requirement already satisfied: protobuf>=3.20.2 in /usr/local/lib/python3.10/dist-packages (from onnx>=1.12.0) (3.20.3)
Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from onnxslim>=0.1.31) (1.13.0)
Requirement already satisfied: packaging in /usr/local/lib/python3.10/dist-packages (from onnxslim>=0.1.31) (24.1)
Requirement already satisfied: absl-py>=0.7.0 in /usr/local/lib/python3.10/dist-packages (from tflite_support) (1.4.0)
Requirement already satisfied: flatbuffers>=2.0 in /usr/local/lib/python3.10/dist-packages (from tflite_support) (24.3.25)
Collecting sounddevice>=0.4.4 (from tflite_support)
  Downloading sounddevice-0.4.7-py3-none-any.whl (32 kB)
Collecting pybind11>=2.6.0 (from tflite_support)
  Downloading pybind11-2.13.1-py3-none-any.whl (238 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 238.8/238.8 kB 194.6 MB/s eta 0:00:00
Collecting coloredlogs (from onnxruntime)
  Downloading coloredlogs-15.0.1-py2.py3-none-any.whl (46 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 46.0/46.0 kB 235.2 MB/s eta 0:00:00
Requirement already satisfied: CFFI>=1.0 in /usr/local/lib/python3.10/dist-packages (from sounddevice>=0.4.4->tflite_support) (1.16.0)
Collecting humanfriendly>=9.1 (from coloredlogs->onnxruntime)
  Downloading humanfriendly-10.0-py2.py3-none-any.whl (86 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 86.8/86.8 kB 265.8 MB/s eta 0:00:00
Requirement already satisfied: mpmath<1.4,>=1.1.0 in /usr/local/lib/python3.10/dist-packages (from sympy->onnxslim>=0.1.31) (1.3.0)
Requirement already satisfied: pycparser in /usr/local/lib/python3.10/dist-packages (from CFFI>=1.0->sounddevice>=0.4.4->tflite_support) (2.22)
Installing collected packages: sng4onnx, pybind11, onnx2tf, onnx, humanfriendly, sounddevice, onnxslim, onnx_graphsurgeon, coloredlogs, tflite_support, onnxruntime
Successfully installed coloredlogs-15.0.1 humanfriendly-10.0 onnx-1.16.1 onnx2tf-1.22.3 onnx_graphsurgeon-0.5.2 onnxruntime-1.18.1 onnxslim-0.1.32 pybind11-2.13.1 sng4onnx-1.0.4 sounddevice-0.4.7 tflite_support-0.4.4

requirements: AutoUpdate success ✅ 16.5s, installed 7 packages: ['sng4onnx>=1.0.1', 'onnx_graphsurgeon>=0.3.26', 'onnx>=1.12.0', 'onnx2tf>1.17.5,<=1.22.3', 'onnxslim>=0.1.31', 'tflite_support', 'onnxruntime']
requirements: ⚠️ Restart runtime or rerun command for updates to take effect

TensorFlow SavedModel: starting export with tensorflow 2.15.0...
Downloading https://github.com/ultralytics/assets/releases/download/v8.2.0/calibration_image_sample_data_20x128x128x3_float32.npy.zip to 'calibration_image_sample_data_20x128x128x3_float32.npy.zip'...
100% 1.11M/1.11M [00:00<00:00, 25.9MB/s]
Unzipping calibration_image_sample_data_20x128x128x3_float32.npy.zip to /content/calibration_image_sample_data_20x128x128x3_float32.npy...: 100% 1/1 [00:00<00:00, 32.95file/s]

ONNX: starting export with onnx 1.16.1 opset 17...
ONNX: slimming with onnxslim 0.1.32...
ONNX: export success ✅ 3.4s, saved as '/content/runs/detect/yolov8m_16b_1024imgsz_100e/weights/best.onnx' (98.7 MB)
TensorFlow SavedModel: collecting INT8 calibration images from 'data=dataset.yaml'
Downloading https://ultralytics.com/assets/Arial.ttf to '/root/.config/Ultralytics/Arial.ttf'...
100% 755k/755k [00:00<00:00, 22.7MB/s]
Scanning /content/datasets/fc_logo_detect_dataset_yolov8/valid/labels... 88 images, 0 backgrounds, 0 corrupt: 100% 88/88 [00:00<00:00, 677.81it/s]
WARNING ⚠️ /content/datasets/fc_logo_detect_dataset_yolov8/valid/images/FCB-World-Fassade_02_6f989e5d986f64bbdce34d771a730ba8.jpg: corrupt JPEG restored and saved
New cache created: /content/datasets/fc_logo_detect_dataset_yolov8/valid/labels.cache
TensorFlow SavedModel: WARNING ⚠️ >300 images recommended for INT8 calibration, found 88 images.
TensorFlow SavedModel: starting TFLite export with onnx2tf 1.22.3...

Automatic generation of each OP name started ========================================
Automatic generation of each OP name complete!

Model loaded ========================================================================

Model conversion started ============================================================
saved_model output started ==========================================================
saved_model output complete!
Summary on the non-converted ops:
---------------------------------
 * Accepted dialects: tfl, builtin, func
 * Non-Converted Ops: 196, Total Ops 517, % non-converted = 37.91 %
 * 196 ARITH ops

- arith.constant:  196 occurrences  (f32: 171, i32: 25)

  (f32: 14)
  (f32: 18)
  (f32: 84)
  (f32: 78)
  (f32: 3)
  (f32: 79)
  (f32: 7)
  (f32: 6)
  (f32: 2)
  (f32: 1)
  (f32: 20)
  (f32: 2)
  (f32: 4)
Float32 tflite output complete!
Summary on the non-converted ops:
---------------------------------
 * Accepted dialects: tfl, builtin, func
 * Non-Converted Ops: 196, Total Ops 688, % non-converted = 28.49 %
 * 196 ARITH ops

- arith.constant:  196 occurrences  (f16: 171, i32: 25)

  (f32: 14)
  (f32: 18)
  (f32: 84)
  (f32: 171)
  (f32: 78)
  (f32: 3)
  (f32: 79)
  (f32: 7)
  (f32: 6)
  (f32: 2)
  (f32: 1)
  (f32: 20)
  (f32: 2)
  (f32: 4)
Float16 tflite output complete!
Summary on the non-converted ops:
---------------------------------
 * Accepted dialects: tfl, builtin, func
 * Non-Converted Ops: 113, Total Ops 517, % non-converted = 21.86 %
 * 113 ARITH ops

- arith.constant:  113 occurrences  (f32: 88, i32: 25)

  (f32: 14)
  (f32: 18)
  (f32: 84)
  (f32: 78)
  (f32: 3)
  (f32: 79)
  (f32: 7)
  (uq_8: 83)
  (f32: 6)
  (f32: 2)
  (f32: 1)
  (f32: 20)
  (f32: 2)
  (f32: 4)
Dynamic Range Quantization tflite output complete!
Input signature information for quantization
signature_name: serving_default
input_name.0: images shape: (1, 320, 320, 3) dtype: <dtype: 'float32'>
Summary on the non-converted ops:
---------------------------------
 * Accepted dialects: tfl, builtin, func
 * Non-Converted Ops: 196, Total Ops 517, % non-converted = 37.91 %
 * 196 ARITH ops

- arith.constant:  196 occurrences  (f32: 171, i32: 25)

  (f32: 14)
  (f32: 18)
  (f32: 84)
  (f32: 78)
  (f32: 3)
  (f32: 79)
  (f32: 7)
  (f32: 6)
  (f32: 2)
  (f32: 1)
  (f32: 20)
  (f32: 2)
  (f32: 4)
fully_quantize: 0, inference_type: 6, input_inference_type: FLOAT32, output_inference_type: FLOAT32
INT8 Quantization tflite output complete!
Summary on the non-converted ops:
---------------------------------
 * Accepted dialects: tfl, builtin, func
 * Non-Converted Ops: 196, Total Ops 517, % non-converted = 37.91 %
 * 196 ARITH ops

- arith.constant:  196 occurrences  (f32: 171, i32: 25)

  (f32: 14)
  (f32: 18)
  (f32: 84)
  (f32: 78)
  (f32: 3)
  (f32: 79)
  (f32: 7)
  (f32: 6)
  (f32: 2)
  (f32: 1)
  (f32: 20)
  (f32: 2)
  (f32: 4)
fully_quantize: 0, inference_type: 6, input_inference_type: INT8, output_inference_type: INT8
Full INT8 Quantization tflite output complete!
Summary on the non-converted ops:
---------------------------------
 * Accepted dialects: tfl, builtin, func
 * Non-Converted Ops: 196, Total Ops 517, % non-converted = 37.91 %
 * 196 ARITH ops

- arith.constant:  196 occurrences  (f32: 171, i32: 25)

  (f32: 14)
  (f32: 18)
  (f32: 84)
  (f32: 78)
  (f32: 3)
  (f32: 79)
  (f32: 7)
  (f32: 6)
  (f32: 2)
  (f32: 1)
  (f32: 20)
  (f32: 2)
  (f32: 4)
INT8 Quantization with int16 activations tflite output complete!
Summary on the non-converted ops:
---------------------------------
 * Accepted dialects: tfl, builtin, func
 * Non-Converted Ops: 196, Total Ops 517, % non-converted = 37.91 %
 * 196 ARITH ops

- arith.constant:  196 occurrences  (f32: 171, i32: 25)

  (f32: 14)
  (f32: 18)
  (f32: 84)
  (f32: 78)
  (f32: 3)
  (f32: 79)
  (f32: 7)
  (f32: 6)
  (f32: 2)
  (f32: 1)
  (f32: 20)
  (f32: 2)
  (f32: 4)

Minimal Reproducible Example

ultralytics installation %pip install ultralytics export command !yolo export format=tflite model=/path/to/your/best.pt imgsz=320 int8

Additional

No response

Are you willing to submit a PR?

[ ] Yes I'd like to help by submitting a PR!

github-actions[bot] commented 1 month ago

👋 Hello @a-sajjad72, thank you for your interest in Ultralytics YOLOv8 🚀! We recommend a visit to the Docs for new users where you can find many Python and CLI usage examples and where many of the most common questions may already be answered.

If this is a 🐛 Bug Report, please provide a minimum reproducible example to help us debug it.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset image examples and training logs, and verify you are following our Tips for Best Training Results.

Join the vibrant Ultralytics Discord 🎧 community for real-time conversations and collaborations. This platform offers a perfect space to inquire, showcase your work, and connect with fellow Ultralytics users.

Install

Pip install the ultralytics package including all requirements in a Python>=3.8 environment with PyTorch>=1.8.

pip install ultralytics

Environments

YOLOv8 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Notebooks with free GPU:
Google Cloud Deep Learning VM. See GCP Quickstart Guide
Amazon Deep Learning AMI. See AWS Quickstart Guide
Docker Image. See Docker Quickstart Guide

Status

If this badge is green, all Ultralytics CI tests are currently passing. CI tests verify correct operation of all YOLOv8 Modes and Tasks on macOS, Windows, and Ubuntu every 24 hours and on every commit.

glenn-jocher commented 1 month ago

TFLite is not supposed to use GPU on export, this is not a bug.

a-sajjad72 commented 1 month ago

TFLite is not supposed to use GPU on export, this is not a bug.

sorry @glenn-jocher I didn't get you. You mean the export process is done without GPU acceleration whether the TF detects GPU or not. Is it?

Burhan-Q commented 1 month ago

Correct, when exporting for TFLite, you should use device="cpu" to ensure the correct device is used.

a-sajjad72 commented 1 month ago

Correct, when exporting for TFLite, you should use device="cpu" to ensure the correct device is used.

@Burhan-Q would i ask why the exporting is done without GPU or is it possible to speed up the process? because when exporting it takes almost >18 minutes for yolov8m model to export for 320 image size.

Burhan-Q commented 1 month ago

@a-sajjad72 that's b/c there is no CUDA delegate (as far as I'm aware) for TFLite. They only build releases for mobile GPUs and not desktop GPUs, so export needs to be on CPU.

a-sajjad72 commented 1 month ago

@a-sajjad72 that's b/c there is no CUDA delegate (as far as I'm aware) for TFLite. They only build releases for mobile GPUs and not desktop GPUs, so export needs to be on CPU.

ok. i got it. @glenn-jocher thanks for your assistance.

glenn-jocher commented 3 weeks ago

@a-sajjad72 you're welcome! If you have any more questions or need further assistance, feel free to ask.

ultralytics / ultralytics