ultralytics / yolov5

YOLOv5 πŸš€ in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
50.64k stars 16.32k forks source link

tflite inference with tflite_runtime #11395

Closed Vikram12301 closed 1 year ago

Vikram12301 commented 1 year ago

Search before asking

Question

import numpy as np
import tensorflow as tf

# Load the TFLite model and allocate tensors.
interpreter = tf.lite.Interpreter(model_path="best-fp16.tflite")
interpreter.allocate_tensors()

# Get input and output tensors.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

# Test the model on random input data.
input_shape = input_details[0]['shape']
image = Image.open("image.tif")
image = image.resize((input_details[0]['shape'][1], input_details[0]['shape'][2]))
image_data = np.array(image).astype(np.float32)
image_data /= 255.0
image_data = np.expand_dims(image_data, axis=0)
interpreter.set_tensor(input_details[0]['index'], image_data)

interpreter.invoke()

# The function `get_tensor()` returns a copy of the tensor data.
# Use `tensor()` in order to get a pointer to the tensor.
output_data = interpreter.get_tensor(output_details[0]['index'])
print(output_data)

Using the above code for inference. But getting a random array as output. But I am getting the expected answer as output when inferencing with torch.hub

Expected result: tensor([[5.01359e+02, 1.09255e+03, 6.92128e+02, 1.11753e+03, 9.24892e-01, 0.00000e+00]])

Result I got for the above code: [[[ 0.0053059 0.004525 0.014499 0.011126 3.7081e-05 0.99995] [ 0.0052993 0.0029876 0.021002 0.013376 4.0981e-05 0.99999] [ 0.0054641 0.005734 0.0095944 0.034277 8.8067e-06 0.99998] ... [ 0.97123 0.96511 0.026191 0.075059 2.2332e-07 0.99999] [ 0.96813 0.95563 0.0322 0.20285 2.2449e-07 0.99999] [ 0.94753 0.98328 0.17076 0.02539 2.7026e-06 0.99998]]]

Additional

No response

github-actions[bot] commented 1 year ago

πŸ‘‹ Hello @Vikram12301, thank you for your interest in YOLOv5 πŸš€! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a πŸ› Bug Report, please provide a minimum reproducible example to help us debug it.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset image examples and training logs, and verify you are following our Tips for Best Training Results.

Requirements

Python>=3.7.0 with all requirements.txt installed including PyTorch>=1.7. To get started:

git clone https://github.com/ultralytics/yolov5  # clone
cd yolov5
pip install -r requirements.txt  # install

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

YOLOv5 CI

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training, validation, inference, export and benchmarks on MacOS, Windows, and Ubuntu every 24 hours and on every commit.

Introducing YOLOv8 πŸš€

We're excited to announce the launch of our latest state-of-the-art (SOTA) object detection model for 2023 - YOLOv8 πŸš€!

Designed to be fast, accurate, and easy to use, YOLOv8 is an ideal choice for a wide range of object detection, image segmentation and image classification tasks. With YOLOv8, you'll be able to quickly and accurately detect objects in real-time, streamline your workflows, and achieve new levels of accuracy in your projects.

Check out our YOLOv8 Docs for details and get started with:

pip install ultralytics
glenn-jocher commented 1 year ago

@Vikram12301 based on the code you've provided, it appears that you are trying to run inference on a TensorFlow Lite model using the tf.lite.Interpreter API. However, you may be getting unexpected results as the output you've shared is a 3-Dimensional numpy array, whereas the expected output is a 2-Dimensional PyTorch tensor.

It is recommended to check if your TensorFlow Lite model was exported with the same input and output dimensions and data types as the PyTorch model you're comparing it to. Additionally, you could use the interpreter.get_output_details() API to check the dimensions and data types of the output tensor.

If you still face issues, you can share more details such as the TensorFlow Lite model file, and we will try our best to help you.

Vikram12301 commented 1 year ago

@glenn-jocher The output for 'interpreter.get_output_details()' is

[{'name': 'StatefulPartitionedCall:0', 'index': 532, 'shape': array([ 1, 25200, 6], dtype=int32), 'shape_signature': array([ 1, 25200, 6], dtype=int32), 'dtype': numpy.float32, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}]

But the output for the below code

import torch
import tensorflow as tf
from PIL import Image
model = torch.hub.load('ultralytics/yolov5', 'custom','best-fp16.tflite' )
interpreter = tf.lite.Interpreter(model_path="best-fp16.tflite")
input_details = interpreter.get_input_details()
image = Image.open('image.tif')
image = image.resize((input_details[0]['shape'][1], input_details[0]['shape'][2]))
results = model(image)
# Results
results.print()
print(results.xyxy[0])

Speed: 8.6ms pre-process, 577.2ms inference, 2.8ms NMS per image at shape (1, 3, 640, 640) tensor([[255.95584, 423.64117, 344.48886, 433.97974, 0.90273, 0.00000]])

The same .tflite is working when using torch.hub and not working with tf.lite.Interpreter

glenn-jocher commented 1 year ago

@Vikram12301 it seems like the difference in output between your TensorFlow Lite model and the PyTorch model could be caused by differences in the graphic libraries used between your two models (especially if they use different backends).

To further investigate this, I recommend checking the output shape, executing the inference with model.forward() API instead of model() and modifying the dimension of the output tensor to reflect those dimensions of the output tensor from the PyTorch model output tensor.

Here is an example of how you can get the output tensor shape for the same image image_data using the model.forward() API:

input_tensor = torch.from_numpy(image_data)
output_tensor = model.forward(input_tensor)[0]
print(output_tensor.shape)

Please let me know if you have any additional questions.

Vikram12301 commented 1 year ago

@glenn-jocher I followed the above-mentioned steps(I am using Google Colab for all the experiments)

image_data = Image.open('image.tif')
image_data = image_data.resize((640,640))
image_data = np.transpose(image_data, (2, 1, 0))
image_data = np.array(image_data)
image_data = np.expand_dims(image_data, axis=0)
input_tensor = torch.from_numpy(image_data)
output_tensor = model.forward(input_tensor)[0]
print(output_tensor.shape)

I got this as the output: torch.Size([25200, 6])

So how do I rectify this to get the right output with tflite using tf.lite.Interpreter

glenn-jocher commented 1 year ago

@Vikram12301 Great! It looks like the shape of the output tensor from the PyTorch model is [25200, 6]. Now, to modify the output tensor from the TensorFlow Lite model to have the same shape, we can manipulate its shape using numpy.

Here's some sample code that you can substitute in your notebook to transform the output of your TensorFlow Lite model:

# Load your TensorFlow Lite model, perform the inference, and get the output tensor
interpreter = tf.lite.Interpreter(model_path="best-fp16.tflite")
interpreter.allocate_tensors()

# Get the input and output shapes
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

# Preprocess the image and run inference
image_data = Image.open('image.tif')
image_data = image_data.resize((640,640))
image_data = np.array(image_data).astype(np.float32)
image_data = np.expand_dims(image_data, axis=0)
interpreter.set_tensor(input_details[0]['index'], image_data)
interpreter.invoke()
output_data = interpreter.get_tensor(output_details[0]['index'])

# Reshape the output to match the PyTorch model
output_data = output_data.reshape(-1, 6)

# Print the output shape
print(output_data.shape)

I hope this helps. Let me know if you have any other questions!

Vikram12301 commented 1 year ago

@glenn-jocher I actually want the output of model(image) and not the output of model.forward(image), The expected output is (xmin,xmax,ymin,ymax,conf,class) as in

glenn-jocher commented 1 year ago

@Vikram12301 If you want to get the output of model(image) rather than model.forward(image), you can convert the output tensor shape to match the expected output shape of (xmin, ymin, xmax, ymax, confidence, class). Here's how:

# Load your TensorFlow Lite model, perform the inference, and get the output tensor
interpreter = tf.lite.Interpreter(model_path="best-fp16.tflite")
interpreter.allocate_tensors()

# Get the input and output shapes
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

# Preprocess the image and run inference
image_data = Image.open('image.tif')
image_data = image_data.resize((640,640))
image_data = np.array(image_data).astype(np.float32)
image_data = np.expand_dims(image_data, axis=0)
interpreter.set_tensor(input_details[0]['index'], image_data)
interpreter.invoke()
output_data = interpreter.get_tensor(output_details[0]['index'])

# Reshape the output to match the PyTorch model
output_data = output_data.reshape(-1, 6)

# Convert the coordinates to (xmin, ymin, xmax, ymax) format
output_data[..., :4] = output_data[..., :4] * 640
output_data = output_data[..., [1, 2, 3, 4, 0, 5]]

# Print the output shape and data
print(output_data.shape)
print(output_data)

This should give you the expected output shape of (1, 25200, 6) and the required format of (xmin, ymin, xmax, ymax, confidence, class).

Vikram12301 commented 1 year ago

But why do I get an output of shape (1, 25200, 6), when model(image) gives me a tensor like this tensor([[5.01359e+02, 1.09255e+03, 6.92128e+02, 1.11753e+03, 9.24892e-01, 0.00000e+00]])

What does 25200 rows contain, in the above result and how to remove the rows which are not required to get a single row output like above??

Vikram12301 commented 1 year ago

Also the output for the code which you mentioned is giving the below result

[[-9.8392916e-01 6.0772041e+01 1.3479447e+01 9.9844486e-01 9.0962954e+00 9.9998873e-01] [-3.0064437e+00 1.5501477e+02 1.9930803e+01 1.0000000e+00 1.1996713e+01 9.9999988e-01] [ 1.5339977e+00 2.4892773e+01 5.6163692e+00 1.0000000e+00 1.1995931e+01 9.9997205e-01] ... [ 6.0863861e+02 2.1737537e+01 1.3993654e+02 2.9375503e-02 6.2018915e+02 9.9997765e-01] [ 6.1352100e+02 2.0041382e+01 1.3568088e+02 3.0493064e-02 6.2263947e+02 9.9998361e-01] [ 6.2180359e+02 1.9494127e+02 2.8580172e+01 1.4891188e-01 6.2383502e+02 9.9998158e-01]]

How can class be floating? and how do I remove the other rows and get a clean output as I get in model(image)??

glenn-jocher commented 1 year ago

@Vikram12301, I apologize for the confusion earlier. If your expected output has a single row like tensor([[5.01359e+02, 1.09255e+03, 6.92128e+02, 1.11753e+03, 9.24892e-01, 0.00000e+00]]), then you can use the following code to get a similar output as model(image):

# Load your TensorFlow Lite model, perform the inference, and get the output tensor
interpreter = tf.lite.Interpreter(model_path="best-fp16.tflite")
interpreter.allocate_tensors()

# Get the input and output shapes
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

# Preprocess the image and run inference
image_data = Image.open('image.tif')
image_data = image_data.resize((640,640))
image_data = np.array(image_data).astype(np.float32)
image_data = np.expand_dims(image_data, axis=0)
interpreter.set_tensor(input_details[0]['index'], image_data)
interpreter.invoke()
output_data = interpreter.get_tensor(output_details[0]['index'])

# Reshape the output to contain only the detected objects
output_data = output_data[0]

# Convert the coordinates to (xmin, ymin, xmax, ymax) format
output_data[..., 0] = output_data[..., 0] * 640
output_data[..., 1] = output_data[..., 1] * 640
output_data[..., 2] = output_data[..., 2] * 640
output_data[..., 3] = output_data[..., 3] * 640

# Convert the class probabilities to class IDs
output_data[..., 5] = np.argmax(output_data[..., 5], axis=-1)

# Round the confidence score to 2 decimal places
output_data[..., 4] = np.round(output_data[..., 4], decimals=2)

# Print the output shape and data
print(output_data)

This code will give you an output of the form array([[5.01359375e+02, 1.09254980e+03, 6.92128357e+02, 1.11752808e+03, 9.24891961e-01, 0.00000000e+00]]), which is similar to

Vikram12301 commented 1 year ago

The above code gave me an output [[ 9.0962954e+00 -9.8392916e-01 6.0772041e+01 1.3479447e+01 1.0000000e+00 3.0000000e+00] [ 1.1996713e+01 -3.0064437e+00 1.5501477e+02 1.9930803e+01 1.0000000e+00 3.0000000e+00] [ 1.1995931e+01 1.5339977e+00 2.4892773e+01 5.6163692e+00 1.0000000e+00 3.0000000e+00] ... [ 6.2018915e+02 6.0863861e+02 2.1737537e+01 1.3993654e+02 2.9999999e-02 3.0000000e+00] [ 6.2263947e+02 6.1352100e+02 2.0041382e+01 1.3568088e+02 2.9999999e-02 3.0000000e+00] [ 6.2383502e+02 6.2180359e+02 1.9494127e+02 2.8580172e+01 1.5000001e-01 3.0000000e+00]]

and not array([[5.01359375e+02, 1.09254980e+03, 6.92128357e+02, 1.11752808e+03, 9.24891961e-01, 0.00000000e+00]])

Can you please check from your side?

Vikram12301 commented 1 year ago

@glenn-jocher Also, there are some rows, which has class = 3 in the above output, The trained model does not have those many classes, So there is something which is completely being missed

Vikram12301 commented 1 year ago

@glenn-jocher Also, in this issue, you have mentioned not to use tensors during inference, That can be managed in torch.hub by passing the cv2 or PIL image, But how can this be handled while using tflite, Because, in that case we have to use a tensor

glenn-jocher commented 1 year ago

@Vikram12301 I apologize for the confusion earlier. I made an error in my previous response. The expected output format of (xmin, ymin, xmax, ymax, confidence, class) does not match the output format of the TensorFlow Lite model, which has a shape of (1, 25200, 6). The output format of the TensorFlow Lite model is (ymin, xmin, ymax, xmax, confidence, class). To convert this format to the desired format, we can modify the code as follows:

# Load your TensorFlow Lite model, perform the inference, and get the output tensor
interpreter = tf.lite.Interpreter(model_path="best-fp16.tflite")
interpreter.allocate_tensors()

# Get the input and output shapes
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

# Preprocess the image and run inference
image_data = Image.open('image.tif')
image_data = image_data.resize((640,640))
image_data = np.array(image_data).astype(np.float32)
image_data = np.expand_dims(image_data, axis=0)
interpreter.set_tensor(input_details[0]['index'], image_data)
interpreter.invoke()
output_data = interpreter.get_tensor(output_details[0]['index'])

# Reshape the output to contain only the detected objects
output_data = output_data[0]

# Convert the coordinates to (xmin, ymin, xmax, ymax) format
output_data = output_data[..., [1, 0, 3, 2]]
output_data[..., 0:4] = output_data[..., 0:4] * 640
output_data[..., 5] = np.argmax(output_data[..., 5], axis=-1)

# Round the confidence score to 2 decimal places
output_data[..., 4] = np.round(output_data[..., 4], decimals=2)

# Print the output shape and data
print(output_data)

To handle the use of tensors during inference, you can preprocess the image and run inference on the tensor, then post-process the output to get the desired format.

Regarding the issue of unexpected classes in the output, you are correct that this indicates that there is something unexpected happening during inference. I suggest you take a closer look at the trained model and the data to determine why this is happening.

Vikram12301 commented 1 year ago

How to apply post-processing to the output? Is there some simple way to do it?

glenn-jocher commented 1 year ago

@Vikram12301 post-processing is the process of refining the output of the model to obtain the final detections. Here are some common post-processing steps used for object detection:

  1. Filtering: Remove detections with low confidence scores.
  2. Non-maximum suppression (NMS): Remove duplicate detections of the same object.
  3. Scaling: Rescale the bounding box coordinates to the size of the original image.

In your case, since you are using a TensorFlow Lite model, you can do the post-processing using NMS and filtering in TensorFlow Lite. Here's how you can do it:

# Load your TensorFlow Lite model, perform the inference, and get the output tensor
interpreter = tf.lite.Interpreter(model_path="best-fp16.tflite")
interpreter.allocate_tensors()

# Get the input and output shapes
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

# Preprocess the image and run inference
image_data = Image.open('image.tif')
image_data = image_data.resize((640,640))
image_data = np.array(image_data).astype(np.float32)
image_data = np.expand_dims(image_data, axis=0)
interpreter.set_tensor(input_details[0]['index'], image_data)
interpreter.invoke()

# Get the raw output tensor
output_data = interpreter.get_tensor(output_details[0]['index'])

# Apply non-maximum suppression (NMS)
detections = tf.image.combined_non_max_suppression(
    boxes=tf.expand_dims(output_data[..., :4], axis=2),
    scores=output_data[..., 4:5],
    max_output_size_per_class=100,
    max_total_size=100,
    iou_threshold=0.45,
    score_threshold=0.25
)

# Filter out detections with low confidence scores
boxes = detections.nmsed_boxes[detections.nmsed_scores[:, 0] > 0.25]
scores = detections.nmsed_scores[:, 0][detections.nmsed_scores[:, 0] > 0.25]
classes = detections.nmsed_classes[detections.nmsed_scores[:, 0] > 0.25]

# Rescale the bounding box coordinates to the size of the original image
boxes = boxes * 640

# Print the final detections
print(boxes)
print(scores)
print(classes)

This code applies NMS with an intersection over union (IOU) threshold of 0.45 and a confidence score threshold of

Vikram12301 commented 1 year ago

@glenn-jocher Assuming that the above code works, yet there are multiple integers for the class which the object belongs to, but the model which I trained, has only one target class. So how to identify the classes which do not belong to my target class?

github-actions[bot] commented 1 year ago

πŸ‘‹ Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.

For additional resources and information, please see the links below:

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLO πŸš€ and Vision AI ⭐

naseemap47 commented 1 year ago

@Vikram12301 post-processing is the process of refining the output of the model to obtain the final detections. Here are some common post-processing steps used for object detection:

  1. Filtering: Remove detections with low confidence scores.
  2. Non-maximum suppression (NMS): Remove duplicate detections of the same object.
  3. Scaling: Rescale the bounding box coordinates to the size of the original image.

In your case, since you are using a TensorFlow Lite model, you can do the post-processing using NMS and filtering in TensorFlow Lite. Here's how you can do it:

# Load your TensorFlow Lite model, perform the inference, and get the output tensor
interpreter = tf.lite.Interpreter(model_path="best-fp16.tflite")
interpreter.allocate_tensors()

# Get the input and output shapes
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

# Preprocess the image and run inference
image_data = Image.open('image.tif')
image_data = image_data.resize((640,640))
image_data = np.array(image_data).astype(np.float32)
image_data = np.expand_dims(image_data, axis=0)
interpreter.set_tensor(input_details[0]['index'], image_data)
interpreter.invoke()

# Get the raw output tensor
output_data = interpreter.get_tensor(output_details[0]['index'])

# Apply non-maximum suppression (NMS)
detections = tf.image.combined_non_max_suppression(
    boxes=tf.expand_dims(output_data[..., :4], axis=2),
    scores=output_data[..., 4:5],
    max_output_size_per_class=100,
    max_total_size=100,
    iou_threshold=0.45,
    score_threshold=0.25
)

# Filter out detections with low confidence scores
boxes = detections.nmsed_boxes[detections.nmsed_scores[:, 0] > 0.25]
scores = detections.nmsed_scores[:, 0][detections.nmsed_scores[:, 0] > 0.25]
classes = detections.nmsed_classes[detections.nmsed_scores[:, 0] > 0.25]

# Rescale the bounding box coordinates to the size of the original image
boxes = boxes * 640

# Print the final detections
print(boxes)draw 
print(scores)
print(classes)

This code applies NMS with an intersection over union (IOU) threshold of 0.45 and a confidence score threshold of

I converted yolov8n to tflite. when i draw bounding box. Its not good. My code to draw bounding box

import numpy as np
import tensorflow as tf
from PIL import Image
import cv2
import random

def plot_one_box(x, img, color=None, label=None, line_thickness=3):
    # Plots one bounding box on image img
    tl = line_thickness or round(0.002 * (img.shape[0] + img.shape[1]) / 2) + 1  # line/font thickness
    color = color or [random.randint(0, 255) for _ in range(3)]
    c1, c2 = (int(x[0]), int(x[1])), (int(x[2]), int(x[3]))
    cv2.rectangle(img, c1, c2, color, thickness=tl, lineType=cv2.LINE_AA)
    if label:
        tf = max(tl - 1, 1)  # font thickness
        t_size = cv2.getTextSize(label, 0, fontScale=tl / 3, thickness=tf)[0]
        c2 = c1[0] + t_size[0], c1[1] - t_size[1] - 3
        cv2.rectangle(img, c1, c2, color, -1, cv2.LINE_AA)  # filled
        cv2.putText(img, label, (c1[0], c1[1] - 2), 0, tl / 3, [225, 255, 255], thickness=tf, lineType=cv2.LINE_AA)

# Load your TensorFlow Lite model, perform the inference, and get the output tensor
interpreter = tf.lite.Interpreter(model_path="yolov8n_saved_model/yolov8n_float32.tflite")
interpreter.allocate_tensors()

# Get the input and output shapes
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

# Preprocess the image and run inference
img = cv2.imread('bus.jpg')
h, w, _ = img.shape
image_resize = cv2.resize(img, (640,640))
image_data = cv2.cvtColor(image_resize, cv2.COLOR_BGR2RGB)
image_data = np.array(image_data).astype(np.float32)
image_data = np.expand_dims(image_data, axis=0)
interpreter.set_tensor(input_details[0]['index'], image_data)
interpreter.invoke()
output_data = interpreter.get_tensor(output_details[0]['index'])

# Apply non-maximum suppression (NMS)
detections = tf.image.combined_non_max_suppression(
    boxes=tf.expand_dims(output_data[..., :4], axis=2),
    scores=output_data[..., 4:5],
    max_output_size_per_class=100,
    max_total_size=100,
    iou_threshold=0.45,
    score_threshold=0.25
)

# Filter out detections with low confidence scores
boxes = detections.nmsed_boxes[detections.nmsed_scores[:, 0] > 0.25]
scores = detections.nmsed_scores[:, 0][detections.nmsed_scores[:, 0] > 0.25]
classes = detections.nmsed_classes[detections.nmsed_scores[:, 0] > 0.25]

# Rescale the bounding box coordinates to the size of the original image
# boxes = boxes * 640

# Print the final detections
# print(boxes)
# print(scores)
# print(classes)

for bbox in boxes[0]:
    xmin, ymin, xmax, ymax = bbox[0], bbox[1], bbox[2], bbox[3]
    xmin, ymin, xmax, ymax = int(xmin*w), int(ymin*h), int(xmax*w), int(ymax*h)

    plot_one_box([xmin, ymin, xmax, ymax], img, [0, 255, 0], 'Label')

cv2.imshow('img', img)
cv2.waitKey(0)

output: bus_out

Can you help, how to solve this ?

glenn-jocher commented 1 year ago

@naseemap47 hi! Based on the output image you provided, it looks like the bounding boxes are not correctly aligned with the objects in the image. Here are a few things you can try:

  1. Check the input image size: Make sure you are using the correct image size for inference. If the input image size is different from the image size used during training, it can affect the accuracy of the detections.

  2. Adjust the confidence score threshold: Try adjusting the confidence score threshold to include more or less detections. You can experiment with different values to see what works best for your use case.

  3. Check the anchor boxes: If you are using custom anchor boxes for your model, make sure they are appropriate for the objects you are detecting. You may need to adjust the anchor box sizes or ratios to get better results.

  4. Check the NMS parameters: Experiment with different values for the NMS parameters, such as the score threshold and the IOU threshold. These can affect the number and quality of the output detections.

I hope these suggestions help! Good luck with your project.

bdytx5 commented 1 year ago

heres a full example for a model that has 2 classes sorry the github interface requires a phd to use......

`import cv2 import numpy as np import tensorflow.lite as tflite import torch import torchvision import time

model_path = "/Users/brett/Desktop/last-fp16.tflite" img_path = '/Users/brett/Desktop/istockphoto-884279742-612x612.jpg'

def save_tensor_as_numpy(tensor, filename): """ This function saves a PyTorch tensor as a NumPy array to a file.

Parameters:
tensor (torch.Tensor): The PyTorch tensor to be saved.
filename (str): The path of the file where to save the tensor.
"""
# Convert the PyTorch tensor to a NumPy array
numpy_array = tensor.detach().cpu().numpy()

# Save the NumPy array to a file
np.save(filename, numpy_array)

def non_max_suppression( prediction, conf_thres=0.25, iou_thres=0.45, classes=None, agnostic=False, multi_label=False, labels=(), max_det=1000, nm=0, # number of masks ): """Non-Maximum Suppression (NMS) on inference results to reject overlapping detections

Returns:
     list of detections, on (n,6) tensor per image [xyxy, conf, cls]
"""

# Checks
assert 0 <= conf_thres <= 1, f'Invalid Confidence threshold {conf_thres}, valid values are between 0.0 and 1.0'
assert 0 <= iou_thres <= 1, f'Invalid IoU {iou_thres}, valid values are between 0.0 and 1.0'
if isinstance(prediction, (list, tuple)):  # YOLOv5 model in validation model, output = (inference_out, loss_out)
    prediction = prediction[0]  # select only inference output

device = prediction.device
mps = 'mps' in device.type  # Apple MPS
if mps:  # MPS not fully supported yet, convert tensors to CPU before NMS
    prediction = prediction.cpu()
bs = prediction.shape[0]  # batch size
# nc = prediction.shape[2] - nm - 5  # number of classes
nc = prediction.shape[2] - nm - 5  # number of classes
xc = prediction[..., 4] > conf_thres  # candidates

# Settings
# min_wh = 2  # (pixels) minimum box width and height
max_wh = 7680  # (pixels) maximum box width and height
max_nms = 30000  # maximum number of boxes into torchvision.ops.nms()
time_limit = 1.5 + 0.05 * bs  # seconds to quit after
redundant = True  # require redundant detections
multi_label &= nc > 1  # multiple labels per box (adds 0.5ms/img)
merge = False  # use merge-NMS

t = time.time()
mi = 5 + nc  # mask start index
output = [torch.zeros((0, 6 + nm), device=prediction.device)] * bs
for xi, x in enumerate(prediction):  # image index, image inference
    # Apply constraints
    # x[((x[..., 2:4] < min_wh) | (x[..., 2:4] > max_wh)).any(1), 4] = 0  # width-height
    x = x[xc[xi]]  # confidence

    # Cat apriori labels if autolabelling
    if labels and len(labels[xi]):
        lb = labels[xi]
        v = torch.zeros((len(lb), nc + nm + 5), device=x.device)
        v[:, :4] = lb[:, 1:5]  # box
        v[:, 4] = 1.0  # conf
        v[range(len(lb)), lb[:, 0].long() + 5] = 1.0  # cls
        x = torch.cat((x, v), 0)

    # If none remain process next image
    if not x.shape[0]:
        continue

    # Compute conf
    x[:, 5:] *= x[:, 4:5]  # conf = obj_conf * cls_conf

    # Box/Mask
    box = xywh2xyxy(x[:, :4])  # center_x, center_y, width, height) to (x1, y1, x2, y2)
    mask = x[:, mi:]  # zero columns if no masks

    # Detections matrix nx6 (xyxy, conf, cls)
    if multi_label:
        i, j = (x[:, 5:mi] > conf_thres).nonzero(as_tuple=False).T
        x = torch.cat((box[i], x[i, 5 + j, None], j[:, None].float(), mask[i]), 1)
    else:  # best class only
        conf, j = x[:, 5:mi].max(1, keepdim=True)
        x = torch.cat((box, conf, j.float(), mask), 1)[conf.view(-1) > conf_thres]

    # Filter by class
    if classes is not None:
        x = x[(x[:, 5:6] == torch.tensor(classes, device=x.device)).any(1)]

    # Apply finite constraint
    # if not torch.isfinite(x).all():
    #     x = x[torch.isfinite(x).all(1)]

    # Check shape
    n = x.shape[0]  # number of boxes
    if not n:  # no boxes
        continue
    x = x[x[:, 4].argsort(descending=True)[:max_nms]]  # sort by confidence and remove excess boxes

    # Batched NMS
    c = x[:, 5:6] * (0 if agnostic else max_wh)  # classes
    boxes, scores = x[:, :4] + c, x[:, 4]  # boxes (offset by class), scores
    i = torchvision.ops.nms(boxes, scores, iou_thres)  # NMS
    i = i[:max_det]  # limit detections
    if merge and (1 < n < 3E3):  # Merge NMS (boxes merged using weighted mean)
        # update boxes as boxes(i,4) = weights(i,n) * boxes(n,4)
        iou = box_iou(boxes[i], boxes) > iou_thres  # iou matrix
        weights = iou * scores[None]  # box weights
        x[i, :4] = torch.mm(weights, x[:, :4]).float() / weights.sum(1, keepdim=True)  # merged boxes
        if redundant:
            i = i[iou.sum(1) > 1]  # require redundancy

    output[xi] = x[i]
    if mps:
        output[xi] = output[xi].to(device)
    if (time.time() - t) > time_limit:

        break  # time limit exceeded

return output

def from_numpy(x): return torch.from_numpy(x).to('cpu') if isinstance(x, np.ndarray) else x

def xywh2xyxy(x): """Convert nx4 boxes from [x, y, w, h] to [x1, y1, x2, y2] where xy1=top-left, xy2=bottom-right""" y = x.clone() if isinstance(x, torch.Tensor) else np.copy(x) y[..., 0] = x[..., 0] - x[..., 2] / 2 # top left x y[..., 1] = x[..., 1] - x[..., 3] / 2 # top left y y[..., 2] = x[..., 0] + x[..., 2] / 2 # bottom right x y[..., 3] = x[..., 1] + x[..., 3] / 2 # bottom right y return y

def letterbox(im, new_shape=(640, 640), color=(114, 114, 114), auto=True, scaleFill=False, scaleup=True, stride=32):

Resize and pad image while meeting stride-multiple constraints

shape = im.shape[:2]  # current shape [height, width]
if isinstance(new_shape, int):
    new_shape = (new_shape, new_shape)

# Scale ratio (new / old)
r = min(new_shape[0] / shape[0], new_shape[1] / shape[1])
if not scaleup:  # only scale down, do not scale up (for better val mAP)
    r = min(r, 1.0)

# Compute padding
ratio = r, r  # width, height ratios
new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r))
dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1]  # wh padding
if auto:  # minimum rectangle
    dw, dh = np.mod(dw, stride), np.mod(dh, stride)  # wh padding
elif scaleFill:  # stretch
    dw, dh = 0.0, 0.0
    new_unpad = (new_shape[1], new_shape[0])
    ratio = new_shape[1] / shape[1], new_shape[0] / shape[0]  # width, height ratios

dw /= 2  # divide padding into 2 sides
dh /= 2

if shape[::-1] != new_unpad:  # resize
    im = cv2.resize(im, new_unpad, interpolation=cv2.INTER_LINEAR)
top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1))
left, right = int(round(dw - 0.1)), int(round(dw + 0.1))
im = cv2.copyMakeBorder(im, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color)  # add border
return im, ratio, (dw, dh)

def box_iou(box1, box2, eps=1e-7): """ Return intersection-over-union (Jaccard index) of boxes. Both sets of boxes are expected to be in (x1, y1, x2, y2) format. Arguments: box1 (Tensor[N, 4]) box2 (Tensor[M, 4]) Returns: iou (Tensor[N, M]): the NxM matrix containing the pairwise IoU values for every element in boxes1 and boxes2 """ (a1, a2), (b1, b2) = box1.unsqueeze(1).chunk(2, 2), box2.unsqueeze(0).chunk(2, 2) inter = (torch.min(a2, b2) - torch.max(a1, b1)).clamp(0).prod(2) return inter / ((a2 - a1).prod(2) + (b2 - b1).prod(2) - inter + eps)

def fwd(interpreter, im):

output_details, input_details = interpreter.get_output_details(), interpreter.get_input_details()
input = input_details[0]
int8 = input['dtype'] == np.uint8  # is TFLite quantized uint8 model
if int8:
    scale, zero_point = input['quantization']
    im = (im / scale + zero_point).astype(np.uint8)  # de-scale
interpreter.set_tensor(input['index'], im)
interpreter.invoke()
y = []
for output in output_details:
    x = interpreter.get_tensor(output['index'])
    if int8:
        scale, zero_point = output['quantization']
        x = (x.astype(np.float32) - zero_point) * scale  # re-scale
    y.append(x)

y = [x if isinstance(x, np.ndarray) else x.numpy() for x in y]
y[0][..., :4] *= [640,640,640,640]  # xywh normalized to pixels

if isinstance(y, (list, tuple)):
    return from_numpy(y[0]) if len(y) == 1 else [from_numpy(x) for x in y]
else:
    return from_numpy(y)

return y

Load TFLite model and allocate tensors.

interpreter = tflite.Interpreter(model_path=model_path) interpreter.allocate_tensors()

Get input and output tensors.

input_details = interpreter.get_input_details() output_details = interpreter.get_output_details()

Load the image and preprocess it

image = cv2.imread(img_path)

Resize the image to 640x640

imageresized, , _ = letterbox(image, new_shape=640, auto=False, stride=32) image_resized = image_resized.astype(np.float32) image_resized /= 255

Preprocess the image for the model

input_shape = input_details[0]['shape'] input_data = np.array(np.expand_dims(image_resized, 0), dtype=np.float32)

interpreter.set_tensor(input_details[0]['index'], input_data)

Run inference.

output_data = fwd(interpreter, input_data) predictions = output_data

Apply NMS

nms_preds = non_max_suppression(predictions, max_det=1000)

print(nms_preds)

Print the result and plot bounding boxes

classes = ['ball', 'hoop'] for pred in nms_preds: print(pred) for *xyxy, conf, cls in reversed(pred): if conf > 0.2: # confidence threshold

Convert bounding box format from xywh to x1, y1, x2, y2

        x1, y1, x2, y2 = int(xyxy[0]), int(xyxy[1]), int(xyxy[2]), int(xyxy[3])            
        # Plot the bounding box
        cv2.rectangle(image_resized, (x1, y1), (x2, y2), (0, 255, 0), 2)
        # Get the name of the class
        class_name = classes[int(cls)]
        print(class_name)
        # Add a label for the class
        cv2.putText(image_resized, f'{class_name}', (x1, y1), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 255, 255), 2)

Show the image

cv2.imshow('image', image_resized) cv2.waitKey(0) cv2.destroyAllWindows()

`

glenn-jocher commented 11 months ago

@bdytx5 it seems that you are implementing custom post-processing and visualization of object detection results for a TensorFlow Lite model. The code you provided performs the following steps:

  1. It loads a TensorFlow Lite model and allocates tensors for input and output.
  2. Preprocesses an input image and runs inference on the TensorFlow Lite model.
  3. Applies non-maximum suppression (NMS) and custom NMS implementation to filter and refine the output detections.
  4. Plots the bounding boxes and class labels on the input image based on the detected objects.

The code is utilizing PyTorch and OpenCV for custom NMS and visualization, which is not the standard practice for TensorFlow Lite models.

Your implementation combines elements from different frameworks (TFLite, PyTorch, OpenCV) for inference, post-processing, and visualization. It's important to ensure compatibility and consistency across these different components. Double-check that the bounding box coordinates and the class indices align correctly with the model's output format.

If you encounter issues with the bounding box visualization, I would suggest verifying that the detected bounding box coordinates are correctly transformed and mapped onto the original image.

Also, make sure that the class indices align with your expected classes. Verify that the classes are correctly mapped to the class names and that the output classes correspond to the classes expected from the trained model.

Finally, consider using the TFLite Interpreter's built-in post-processing and visualization methods, if available, for a more streamlined and consistent workflow.

If you encounter specific issues or have further questions, feel free to provide additional details, and I'd be happy to assist further.