Question: Slow Inference on Google Coral Edge TPU

NickLojewski commented 1 year ago

Hello, I have a question regarding inference on the Google Coral Edge TPU. I have exported the model in the correct format and I am trying to run inference on the TPU, but experiencing slower than expected inference times.

yolov5 % python detect.py --weights yolov5s-int8_edgetpu.tflite --source test.mp4 --view-img
detect: weights=['yolov5s-int8_edgetpu.tflite'], source=test.mp4, data=data/coco128.yaml, imgsz=[640, 640], conf_thres=0.25, iou_thres=0.45, max_det=1000, device=, view_img=True, save_txt=False, save_conf=False, save_crop=False, nosave=False, classes=None, agnostic_nms=False, augment=False, visualize=False, update=False, project=runs/detect, name=exp, exist_ok=False, line_thickness=3, hide_labels=False, hide_conf=False, half=False, dnn=False, vid_stride=1
requirements: /Users/nick/Desktop/requirements.txt not found, check failed.
YOLOv5 🚀 v7.0-153-gff6a9ac Python-3.10.3 torch-2.0.0 CPU

Loading yolov5s-int8_edgetpu.tflite for TensorFlow Lite Edge TPU inference...
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
video 1/1 (1/359) /yolov5/test.mp4: 640x640 8 persons, 1 umbrella, 2 handbags, 2 suitcases, 800.0ms
...

It appears that the inference is relying on the Google Coral TPU to some degree because if I unplug the TPU during inference, the code errors out, but these speeds are nearly the same as if I was to run inference on CPU.

What could be going on here? Is this line: INFO: Created TensorFlow Lite XNNPACK delegate for CPU. any indication that the TPU isn't being utilized correctly? Thanks for any help!

github-actions[bot] commented 1 year ago

👋 Hello @NickLojewski, thank you for your interest in YOLOv5 🚀! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide a minimum reproducible example to help us debug it.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset image examples and training logs, and verify you are following our Tips for Best Training Results.

Requirements

Python>=3.7.0 with all requirements.txt installed including PyTorch>=1.7. To get started:

git clone https://github.com/ultralytics/yolov5  # clone
cd yolov5
pip install -r requirements.txt  # install

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Notebooks with free GPU:
Google Cloud Deep Learning VM. See GCP Quickstart Guide
Amazon Deep Learning AMI. See AWS Quickstart Guide
Docker Image. See Docker Quickstart Guide

Status

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training, validation, inference, export and benchmarks on macOS, Windows, and Ubuntu every 24 hours and on every commit.

Introducing YOLOv8 🚀

We're excited to announce the launch of our latest state-of-the-art (SOTA) object detection model for 2023 - YOLOv8 🚀!

Designed to be fast, accurate, and easy to use, YOLOv8 is an ideal choice for a wide range of object detection, image segmentation and image classification tasks. With YOLOv8, you'll be able to quickly and accurately detect objects in real-time, streamline your workflows, and achieve new levels of accuracy in your projects.

Check out our YOLOv8 Docs for details and get started with:

pip install ultralytics

glenn-jocher commented 1 year ago

@NickLojewski hello! It sounds like the TPU is being recognized by TensorFlow Lite and used, since unplugging it during inference gives an error. However, the issue might be related to the fact that you are using --half argument, which enables half precision (float16) inference on the CPU. This can cause slower inference times on TPU since it requires converting the data back to float32 before sending it to the TPU. I recommend disabling this argument for TPU inference and see if it improves the performance. You can do this by using --no-half or setting half=False in code. Let me know if this helps!

NickLojewski commented 1 year ago

Hi @glenn-jocher ! Thanks so much for this guidance and explaining why --half can cause slower inference on TPUs.

This sped things up quite a bit, I am now down from ~800ms per frame to around ~360ms per frame on yolov5s-int8_edgetpu.tflite.

An open follow up question for glenn or other folks who have gotten around to playing with a Google Coral with yolov5 models... Do these speeds seem reasonable/typical or is there more room for me to tweak and get faster inference speeds?

glenn-jocher commented 1 year ago

@NickLojewski, you're welcome! 360 ms per frame on yolov5s-int8_edgetpu.tflite sounds reasonable and in line with the performance reported in the Coral Edge TPU benchmarks. Note that performance will be affected by factors such as model size, the type of input image (e.g. resolution), and the number of objects in the image. It's worth considering quantization-aware training for further optimization, which can significantly reduce the model size and inference time on Edge TPUs. Finally, make sure that the Coral Accelerator Runtime (car) is up-to-date to ensure maximum performance. For more details about the acceleration of the Coral Edge TPU, visit the Coral Examples repo.

NickLojewski commented 1 year ago

@glenn-jocher Thanks you for the guidance! Closing this issue now as resolved.

glenn-jocher commented 1 year ago

@NickLojewski You're welcome! Don't hesitate to reach out if you have any more questions or issues in the future. Have a great day!

ultralytics / yolov5