Question about running batch inference with PyTorch Tensor as input

Renzzauw commented 2 years ago

Search before asking

[X] I have searched the YOLOv5 issues and discussions and found no similar questions.

Question

Hi!

Problem introduction

I am trying to run batch inference on batches of video frames obtained from NVIDIA DALI, but I am having some issues with implementing this. I am trying to make this work, as performance is of high importance, which I try to maximize as much as possible.

What I tried so far

I simply convert and reshape the output of DALI to a PyTorch tensor of shape (batch_size, 3, height, width) and then feed this to my yolov5 model, which I load using the PyTorch hub, as shown in the batch inference example code. This resulted in various errors with regard to shapes not matching, so I assume my input is not of the correct shape or not properly pre-processed.

I did some digging in the yolov5 code to see how exactly batch inference works and what types of input are supported. I found that when you input a list of images (e.g. of type ndarray or PIL), these images are pre-processed, so they are made suitable for running inference. However, when I input a PyTorch Tensor, I see this is directly fed to the model, seemingly without any pre-processing (as far as I can understand the code).

Question

My question is, what are the required steps to make a PyTorch Tensor work as input for batch inference? The forward function arguments comments suggest the width or height should be equal to the size parameter and the values should be within range [0, 1], but this does not seem to be all the required steps. Is there any functions available within this repo to pre-process the Tensor?

Alternative I tried

I did some alternative tests with passing a list of video frames (ndarrays of shape (height, width, 3)) instead to the model, but the performance gain seemed to be limited compared to simply running on batch size 1. Hence, I am trying to make batch inference work.

I did not seem to find any resources online or in existing issues that answer my very specific question. Thanks in advance for helping me out with this question. If I need to provide any more details, feel free to tell me so!

Additional

No response

github-actions[bot] commented 2 years ago

👋 Hello @Renzzauw, thank you for your interest in YOLOv5 🚀! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available.

For business inquiries or professional support requests please visit https://ultralytics.com or email support@ultralytics.com.

Requirements

Python>=3.7.0 with all requirements.txt installed including PyTorch>=1.7. To get started:

git clone https://github.com/ultralytics/yolov5  # clone
cd yolov5
pip install -r requirements.txt  # install

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Google Colab and Kaggle notebooks with free GPU:
Google Cloud Deep Learning VM. See GCP Quickstart Guide
Amazon Deep Learning AMI. See AWS Quickstart Guide
Docker Image. See Docker Quickstart Guide

Status

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), validation (val.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu every 24 hours and on every commit.

glenn-jocher commented 2 years ago

@Renzzauw 👋 Hello! Thanks for asking about inference speed issues. For our official YOLOv5 batch-size speed study see https://community.ultralytics.com/t/yolov5-study-batch-size-vs-speed/31

YOLOv5 🚀 can be run on CPU (i.e. --device cpu, slow) or GPU if available (i.e. --device 0, faster). You can determine your inference device by viewing the YOLOv5 console output:

detect.py inference

python detect.py --weights yolov5s.pt --img 640 --conf 0.25 --source data/images/

YOLOv5 PyTorch Hub inference

import torch

# Model
model = torch.hub.load('ultralytics/yolov5', 'yolov5s')

# Images
dir = 'https://ultralytics.com/images/'
imgs = [dir + f for f in ('zidane.jpg', 'bus.jpg')]  # batch of images

# Inference
results = model(imgs)
results.print()  # or .show(), .save()
# Speed: 631.5ms pre-process, 19.2ms inference, 1.6ms NMS per image at shape (2, 3, 640, 640)

Increase Speeds

If you would like to increase your inference speed some options are:

Use batched inference with YOLOv5 PyTorch Hub
Reduce --img-size, i.e. 1280 -> 640 -> 320
Reduce model size, i.e. YOLOv5x -> YOLOv5l -> YOLOv5m -> YOLOv5s -> YOLOv5n
Use half precision FP16 inference with python detect.py --half and python val.py --half
Use a faster GPUs, i.e.: P100 -> V100 -> A100
Export to ONNX for up to 3x CPU speedup
Export to TensorRT for up to 5x GPU speedup
Use a free GPU backends with up to 16GB of CUDA memory:

Good luck 🍀 and let us know if you have any other questions!

Renzzauw commented 2 years ago

Hi @glenn-jocher and thank you for your reply.

Thank you for providing some resources. I have studied most of these already, and I'm afraid that they do not really provide answer to my question. I have taken various steps you mentioned in the Increase Speeds part you linked, yet I'm specifically interested in more information about batched inference. However, the examples you've linked, which I studied a lot recently, simply do not cover any details in regard to inputting a PyTorch Tensor.

glenn-jocher commented 2 years ago

@Renzzauw YOLOv5 PyTorch Hub models support torch tensor inputs at any batch size, however since we can not determine the preprocessing steps used (i.e. letterboxing, padding etc.) we can not implement a postprocessing strategy. Therefore torch inference types act as pass-throughs, enabling you to create your own pre and post processing workflows.

glenn-jocher commented 2 years ago

See AutoShape forward method for details:

https://github.com/ultralytics/yolov5/blob/c43f13557185403631d7eae804413ccf27ae9a2a/models/common.py#L507-L524

Renzzauw commented 2 years ago

@glenn-jocher Thank you for your reply, this makes it clear how to use PyTorch tensors. I will look into implementing my own pre- and post-processing steps.

ultralytics / yolov5