Interpreting YOLOv8 Pose outputs in `tflite`

ovshake commented 8 months ago

Search before asking

[X] I have searched the YOLOv8 issues and discussions and found no similar questions.

Question

Hi folks, very happy to join this wonderful community. I had a query regarding the outputs of tflite version of YOLOv8-pose. I am getting an output of (1, 56, 8400) from the model, out of which I understand that the first 5 coordinates are (x,y,w,h,conf) for the bboxes, and the rest are 17x3 keypoints (x, y, visibility). The key points I am getting are something like this

582.0 316.0 1.0
574.0 345.0 1.0
574.0 344.0 1.0
577.0 324.0 1.0
580.0 324.0 1.0
573.0 346.0 1.0
575.0 345.0 1.0
569.0 370.0 1.0
572.0 369.0 1.0

Now it seems like some post-processing is needed, since all the key points are in the format 5xx, 3xx. Can you tell me what post-processing is needed to get these to image coordinates?

Additional

This is my code for reference

import cv2
import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
from ultralytics.utils.ops import scale_coords

# Load TFLite model and allocate tensors.
interpreter = tf.lite.Interpreter(model_path="/home/ubuntu/projects/ultralytics/yolov8n-pose_saved_model/yolov8n-pose_float32.tflite")
interpreter.allocate_tensors()

# Get input and output tensors.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

# Read the image
image_path = "/home/ubuntu/projects/ultralytics/bus.jpg"
image = cv2.imread(image_path)

# Get the input size from the model's input details and resize the image accordingly
input_size = input_details[0]['shape'][1:3]
image = cv2.resize(image, tuple(input_size))
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

# Convert the image to a float32 numpy array and add an extra dimension
input_data = np.expand_dims(image.astype(np.float32), axis=0)

# Set the tensor to point to the input data to be used
interpreter.set_tensor(input_details[0]['index'], input_data)

# Run the model
interpreter.invoke()

# Get the output tensor
output_data = interpreter.get_tensor(output_details[0]['index'])

output_data_transposed = output_data[0].T

# Print the output shape
print("output_data_transposed", output_data_transposed.shape)

# Select the bbox with the highest confidence

print("Argmax:", np.argmax(output_data_transposed[:, -1]))
bbox = output_data_transposed[np.argmax(output_data_transposed[:, -1])]

print("Bbox shape:", bbox.shape)

# Select the first 51 elements and reshape it into 17x3
keypoints = bbox[5:].reshape((17, 3))
keypoints = scale_coords(input_size, keypoints, image.shape).round()
# print("Keypoints:\n", keypoints)

# Plot the keypoints on the image
plt.imshow(image)
for i in range(17):
    print(keypoints[i, 0], keypoints[i, 1], keypoints[i, 2])
    plt.plot(keypoints[i, 0], keypoints[i, 1], 'ro')
plt.savefig('test-tflite.png')

github-actions[bot] commented 8 months ago

👋 Hello @ovshake, thank you for your interest in YOLOv8 🚀! We recommend a visit to the YOLOv8 Docs for new users where you can find many Python and CLI usage examples and where many of the most common questions may already be answered.

If this is a 🐛 Bug Report, please provide a minimum reproducible example to help us debug it.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset image examples and training logs, and verify you are following our Tips for Best Training Results.

Join the vibrant Ultralytics Discord 🎧 community for real-time conversations and collaborations. This platform offers a perfect space to inquire, showcase your work, and connect with fellow Ultralytics users.

Install

Pip install the ultralytics package including all requirements in a Python>=3.8 environment with PyTorch>=1.8.

pip install ultralytics

Environments

YOLOv8 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Notebooks with free GPU:
Google Cloud Deep Learning VM. See GCP Quickstart Guide
Amazon Deep Learning AMI. See AWS Quickstart Guide
Docker Image. See Docker Quickstart Guide

Status

If this badge is green, all Ultralytics CI tests are currently passing. CI tests verify correct operation of all YOLOv8 Modes and Tasks on macOS, Windows, and Ubuntu every 24 hours and on every commit.

glenn-jocher commented 8 months ago

@ovshake hi, it's great to have you in the YOLOv8 community!

The output dimensions you're seeing are consistent with the YOLOv8-pose model. The model's output includes bounding box parameters, confidence scores, and keypoints for pose estimation.

Keep in mind that the output values are relative to the grid cells rather than absolute image coordinates.

To convert these into image coordinates, you will need to apply a transformation that involves scaling and translating these relative values into absolute coordinates.

The scale_coords function from Ultralytics is designed to perform this task. Looking at your code snippet, it seems like you are using it correctly.

Your keypoint values (5xx, 3xx) are indeed in the format you would expect after this transformation - they're interpreted as pixel coordinates relative to the size of your input image.

It's important to ensure that the input_size you're passing to scale_coords matches the actual size of the input image used in the network. If you're resizing your images before running them through the model, the original dimensions should be used for the transformation.

Finally, remember that the 'visibility' values associated with each keypoint do not need to be transformed; they are already in a usable format. In general, a '1' implies the keypoint is visible and a '0' implies it's not visible.

If you continue to have issues, it might be helpful to double-check your pre-processing and post-processing steps, the dimensions of your input image, and consider examining some intermediate values to pinpoint where the unexpected values begin to appear.

ovshake commented 8 months ago

Hi @glenn-jocher , so I printed the 56 elements of the row with the most confident bbox to understand the ordering.

[    0.98055     0.32275      0.0374    0.068774      0.9857      644.77      182.52    0.011257      641.34      179.25   0.0016473       643.7      179.86    0.013569      620.56       176.4    0.019903      640.23      178.38     0.34054      614.46      184.02     0.83512      635.39      187.46     0.94171
      616.46      218.51     0.59848      632.19      221.65     0.94191      623.52      240.13      0.3296      634.65      244.11     0.73696      609.64      228.42     0.96015      626.48      230.41     0.97794      629.12      224.86     0.76989      640.25      226.13     0.87892      628.97      235.56
     0.59242      640.37      237.37     0.71924]

Looks like the first 4 elements are xywh of bbox, then its confidence of the human inside the bbox, then its 17x3 keypoints with (x, y, visibility) format. But on getting the bbox with the largest confidence, I am getting this bbox, which seems not the best one, at all. The top - 10 bboxes look like this. test-tflite

I have uploaded by script as well. The input size of the image is 640x640, so the output pixels are expected to be in that resolution. I am unable to understand which part am I missing. Do you have any ideas around this?

This is the script

import cv2
import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
from ultralytics.utils.ops import scale_coords

def draw_bbox_on_image(image, x, y, w, h):
    # Denormalize the coordinates
    x = int(x * image.shape[1])
    y = int(y * image.shape[0])
    w = int(w * image.shape[1])
    h = int(h * image.shape[0])

    # Draw the bounding box
    cv2.rectangle(image, (x, y), (x+h, y+w), (0, 255, 0), 2)

    return image

def plot_keypoints_on_image(image, keypoints, t):
    # Iterate over the keypoints
    for keypoint in keypoints:
        x, y, visibility = keypoint

        # Check if the visibility is greater than the threshold
        if visibility > t:
            # Denormalize the coordinates
            x = int(x)
            y = int(y)

            # Draw the keypoint
            cv2.circle(image, (x, y), 2, (0, 0, 255), -1)

    return image

# Load TFLite model and allocate tensors.
interpreter = tf.lite.Interpreter(model_path="/home/ubuntu/projects/ultralytics/yolov8n-pose_saved_model/yolov8n-pose_float32.tflite")
interpreter.allocate_tensors()

# Get input and output tensors.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

# Read the image
image_path = "/home/ubuntu/projects/ultralytics/bus.jpg"
image = cv2.imread(image_path)

# Get the input size from the model's input details and resize the image accordingly
input_size = input_details[0]['shape'][1:3]
image = cv2.resize(image, tuple(input_size))
# image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

# Convert the image to a float32 numpy array and add an extra dimension
input_data = np.expand_dims(image.astype(np.float32), axis=0)

# Set the tensor to point to the input data to be used
interpreter.set_tensor(input_details[0]['index'], input_data)

# Run the model
interpreter.invoke()

# Get the output tensor
output_data = interpreter.get_tensor(output_details[0]['index'])

output_data_transposed = output_data[0].T

# Select the top K bboxes
K = 10  # Change this to your desired number of bboxes
BASE = 0
sorted_indices = np.argsort(output_data_transposed[:, 5])[::-1]
top_K_by_confidence = output_data_transposed[sorted_indices[BASE:BASE+K]]
print("top_K_by_confidence", top_K_by_confidence[0])

# Process each bbox
for bbox in top_K_by_confidence:
    # Select the first 51 elements and reshape it into 17x3
    keypoints = bbox[5:].reshape((17, 3))
    xywh = bbox[:4]
    image = draw_bbox_on_image(image, xywh[0], xywh[1], xywh[2], xywh[3])
    image = plot_keypoints_on_image(image, keypoints, 0.7)

# Save the image
# cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
cv2.imwrite('test-tflite.png', image)

glenn-jocher commented 8 months ago

Hello @ovshake,

That's a great question. Yes, your understanding of the outputted array is correct. For each detection, there are 56 values comprising the 4 values for the bounding box dimensions (x, y, width, height), 1 value for the confidence score, and then 17 sets of 3 values for the pose keypoints (x, y, visibility).

From what I can see in your script, you first get the output tensor from TensorFlow Lite using interpreter.get_tensor, then transpose it and get the top-K bounding boxes with highest confidence values. After that, you draw the bounding box and keypoints on your image.

Regarding the bounding box not appearing as expected, it could be attributed to a few reasons. One key point is how the bounding box values are being processed.

In your draw_bbox_on_image function, you're multiplying the xywh values by the image's width and height, which assumes these values are normalized to 0-1. Please verify if this assumption holds true for your data. Incorrect assumptions about how your values are normalized could result in the bounding boxes being drawn in incorrect locations.

Further, it's important to note that the x and y values are not the top left corner of the box but the center of the bounding box. In your function draw_bbox_on_image you're using (x,y) as the top left corner, this could be the reason why the bounding boxes are not accurate.

I suggest revisiting your bounding box drawing method to ensure that the coordinates align properly with how YOLOv8 defines them.

I hope this helps! If you have further questions feel free to ask.

github-actions[bot] commented 7 months ago

👋 Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.

For additional resources and information, please see the links below:

Docs: https://docs.ultralytics.com
HUB: https://hub.ultralytics.com
Community: https://community.ultralytics.com

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLO 🚀 and Vision AI ⭐

dobrykod commented 7 months ago

It looks like you were sorting by x value of the first keypoint so you got only bboxes on the right hand side:

sorted_indices = np.argsort(output_data_transposed[:, 5])[::-1]

But you wanted to sort by bbox confidence which is one index before:

sorted_indices = np.argsort(output_data_transposed[:, 4])[::-1]

dobrykod commented 7 months ago

And it seems it is needed to normalize color values of the input image by dividing them by 255.

glenn-jocher commented 7 months ago

@dobrykod hello,

Great observation! Yes, you're absolutely right. Generally, the YOLOv8 model expects the input image data to be normalized. This specifically means that pixel values, which are typically in the range of 0-255 (for 8-bit color depth images), should be scaled to a range of 0-1.

This is a common preprocessing step when working with image data for deep learning models, and it helps improve the performance and stability during training by ensuring that all features (pixel values in this case) are in a similar range.

To implement this, you would indeed divide all pixel values by 255 before passing the image to the model.

Thank you for pointing out this necessary preprocessing step to ensure optimal model performance. Please don't hesitate to share any further insights or questions!

dev-techshlok commented 7 months ago

Hello @glenn-jocher Can you post the final code for drawing the images with normalized data? I tried and my output is like this.


import cv2
import matplotlib.pyplot as plt
import numpy as np
# import tensorflow as tf
# from ultralytics.utils.ops import scale_coords
import tflite_runtime.interpreter as tflite

def draw_bbox_on_image(image, x, y, w, h):
    # Denormalize the coordinates
    x = int(x * image.shape[1])
    y = int(y * image.shape[0])
    w = int(w * image.shape[1])
    h = int(h * image.shape[0])

    # Draw the bounding box
    # cv2.rectangle(image, (x, y), (x+h, y+w), (0, 255, 0), 2)
    cv2.rectangle(image, (x, y), (x + w, y + h), (0, 255, 0), 2)

    return image

def plot_keypoints_on_image(image, keypoints, t):
    # Iterate over the keypoints
    for keypoint in keypoints:
        x, y, visibility = keypoint

        # Check if the visibility is greater than the threshold
        if visibility > t:
            # Denormalize the coordinates
            x = int(x)
            y = int(y)

            # Draw the keypoint
            cv2.circle(image, (x, y), 2, (0, 0, 255), -1)

    return image

# Load TFLite model and allocate tensors.
interpreter = tflite.Interpreter(model_path="yolov8n-pose_float32.tflite")
interpreter.allocate_tensors()

# Get input and output tensors.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

# Read the image
image_path = "bus.jpg"
image = cv2.imread(image_path)

# image = cv2.convertScaleAbs(image_)
image = image.astype('float32')
image = image/255.0
# Get the input size from the model's input details and resize the image accordingly
input_size = input_details[0]['shape'][1:3]
image = cv2.resize(image, tuple(input_size))

image_ = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

# Convert the image to a float32 numpy array and add an extra dimension
input_data = np.expand_dims(image.astype(np.float32), axis=0)

# Set the tensor to point to the input data to be used
interpreter.set_tensor(input_details[0]['index'], input_data)

# Run the model
interpreter.invoke()

# Get the output tensor
output_data = interpreter.get_tensor(output_details[0]['index'])

output_data_transposed = output_data[0].T

# Select the top K bboxes
K = 10  # Change this to your desired number of bboxes
BASE = 0
sorted_indices = np.argsort(output_data_transposed[:, 4])[::-1]
top_K_by_confidence = output_data_transposed[sorted_indices[BASE:BASE+K]]
print("top_K_by_confidence", top_K_by_confidence[0])

# Process each bbox
for bbox in top_K_by_confidence:
    # Select the first 51 elements and reshape it into 17x3
    keypoints = bbox[5:].reshape((17, 3))
    xywh = bbox[:4]
    image_1 = draw_bbox_on_image(image_, xywh[0], xywh[1], xywh[2], xywh[3])
    image_1 = plot_keypoints_on_image(image_1, keypoints, 0.7)

# Save the image
# cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
# cv2.imshow("windows",image_)
# cv2.waitKey(1) & 0xFF == 27 # Press 'Esc' to exit

cv2.imwrite('test-tflite.png', image_1)

The output image is coming like this. test-tflite

But when I run this code it gives good output

model_path = "yolov8n-pose_float32.tflite"
in_image_size = 640
from ultralytics import YOLO

model = YOLO(model_path)
model.predict(source="bus.jpg", imgsz = in_image_size, save=True,  show_conf = True,show=True)

bus

Please suggest!

Thank you.

dev-techshlok commented 7 months ago

Fixed the issue by changing this function did not notice that needed to get x1 y1 and x2 and y2

def draw_bbox_on_image(image, x, y, w, h):
    # Denormalize the coordinates
    x = int(x * image.shape[1])
    y = int(y * image.shape[0])
    w = int(w * image.shape[1])
    h = int(h * image.shape[0])

    # Calculate the (x1, y1) and (x2, y2) points for the rectangle
    x1, y1 = x - w // 2, y - h // 2
    x2, y2 = x + w // 2, y + h // 2

    # Draw the bounding box
    cv2.rectangle(image, (x1, y1), (x2, y2), (0, 255, 0), 2)

But still need help regarding the lines between the circles and that too with color coded sequence Also how it is writing person and confidence score on image and from where it is getting it. Also in my image the dots are a bit scattered.

Thank you

glenn-jocher commented 7 months ago

Hello @dev-techshlok,

I'm glad to hear that the change in your draw_bbox_on_image function helped in correctly plotting the bounding boxes. The change you made correctly reflects that the x and y values are the center of the bounding box, and the width and height need to be considered accordingly to get the correct bounding box coordinates.

Regarding your query about drawing lines between the detected keypoints (circles): to plot lines between detected keypoints, one possible way is to have an ordered list specifying the connections between keypoints, and then use the cv2.line function to draw lines between these keypoints. However, the connections between keypoints would typically be specific to the pose estimation model being used.

About color coding the sequence: you might use an array of different colors, and then use these colors sequentially when plotting your keypoints and lines.

For 'person' label and confidence score: the 'person' label is the class label and is specific to the type of objects your model has been trained to detect. The confidence score associated with each detection is typically part of the output of object detection models. In your case, it seems like the confidence score might be the 5th value in your output array.

About the scattered dots, it might be due to how the pose estimation is being performed by the model. Accuracy can vary depending on different factors such as the quality and pose of the person in the image, lighting conditions, etc.

Hope this helps! Let us know if you have any more questions!

dev-techshlok commented 7 months ago

Hello @glenn-jocher

Thank you for the response. I made a connecting list for each point and color in front of that.

KEYPOINT_EDGE_INDS_TO_COLOR = {
    (0, 1): (147, 20, 255),
    (0, 2): (255, 255, 0),
    (1, 3): (147, 20, 255),
    (2, 4): (255, 255, 0),
    (0, 5): (147, 20, 255),
    (0, 6): (255, 255, 0),
    (5, 7): (147, 20, 255),
    (7, 9): (147, 20, 255),
    (6, 8): (255, 255, 0),
    (8, 10): (255, 255, 0),
    (5, 6): (0, 255, 255),
    (5, 11): (147, 20, 255),
    (6, 12): (255, 255, 0),
    (11, 12): (0, 255, 255),
    (11, 13): (147, 20, 255),
    (13, 15): (147, 20, 255),
    (12, 14): (255, 255, 0),
    (14, 16): (255, 255, 0)
}

Just unable to figure out the starting point coordinates for key points and the ending point of the line. I will try to figure that out.

The major thing is that most of the bonding box thresholds are more than 0.7 thus making a lot of bounding boxes on a single object. If the score is selected more than 0.9 or 0.8 then the model does not make any bounding box. Do you have any suggestions?

glenn-jocher commented 7 months ago

Hi @dev-techshlok,

Creating the list for connections between keypoints and assigning colors to them is a good starting point for joining the keypoints using lines.

To define the starting and ending points of each line, you'll want to look at each tuple in your KEYPOINT_EDGE_INDS_TO_COLOR dictionary. Each of these tuples represents a pair of keypoints that should be connected by a line. You can use these indices to look up the coordinates of the corresponding keypoints. Once you have the coordinates for both keypoints, you have your starting and ending points for the line.

Regarding the bounding box issue, the threshold of 0.7 that you mentioned is probably the confidence score threshold. The model will only detect bounding boxes which have a confidence score greater than this threshold. If you set the threshold too high (e.g., 0.9 or more), the model might fail to detect any bounding boxes if it's not certain enough about any of the detections.

The key here is to find the optimal confidence score threshold that works for your specific use case. You may need to experiment with different threshold values. A lower threshold will allow more detections but may result in more false positives (detecting something when there is nothing there). A higher threshold will result in fewer detections but might miss some objects (false negatives).

Keep in mind that various factors can affect confidence scores, including image quality and complexity, diversity of training data, and many others. Seeing a lot of bounding boxes on a single object might also indicate that there may be room to further optimize the model or tune the non-maximum suppression (NMS) parameters, which is a technique used to keep the best bounding box when several boxes overlap for the same object.

I hope this helps, and I encourage you to continue working with the model and adjusting parameters to get the best results.

dev-techshlok commented 7 months ago

So I used this code to draw the lines with colors,

def plot_keypoints_on_image(image, keypoints, t):
    for edge, color in KEYPOINT_EDGE_INDS_TO_COLOR.items():
        point1_index, point2_index = edge
        x1, y1, _ = keypoints[point1_index]
        x2, y2, _ = keypoints[point2_index]
        x1, y1, x2, y2 = int(x1), int(y1), int(x2), int(y2)

        # Draw the line on the image
        cv2.line(image, (x1, y1), (x2, y2), color, 2)
        cv2.circle(image, (x1, y1), 4, color, -1) 

    return image

Just one thought without using GPU when running on Linux on a video it shows around 4 FPS, I think it's because of the image size can we reduce the image size to 270X270 instead of 640?

Is there a method you have used before.

Thank you

glenn-jocher commented 7 months ago

@dev-techshlok hello,

Your implementation to connect the keypoints and drawing them on the image looks correct. You're using each edge from your KEYPOINT_EDGE_INDS_TO_COLOR dictionary to get the x and y coordinates of both the start and end of each line, which should properly connect your keypoints.

Regarding your FPS concern, yes, the image size can indeed impact the speed of object detection. The larger the image, the more processing that is required, which can slow down the FPS. Reducing the size of the image can help to speed up object detection, but be mindful that it could potentially lower the model's accuracy because smaller images have less detail.

As for resizing the images: yes, you can technically resize the images to any size you want before feeding them into the model. However, keep in mind that the used model, YOLOv8, has been trained on certain image sizes, and deviating too much from those sizes might hinder the performance of the model.

Please experiment with different sizes that suit your application and provide a balance between speed and accuracy. Make sure you're resizing both the training images and the images you're using for detection to maintain consistency.

dev-techshlok commented 7 months ago

Hello,

I am trying to convert the model to int8=True , I was able to convert the model to full quant using this method, Is this correct?

from ultralytics import YOLO

# Load a model
model = YOLO('yolov8n-pose.yaml')  # build a new model from YAML
model = YOLO('yolov8n-pose.pt')  # load a pretrained model (recommended for training)
model = YOLO('yolov8n-pose.yaml').load('yolov8n-pose.pt')  # build from YAML and transfer weights

# Train the model
results = model.train(data='coco8-pose.yaml', epochs=100, imgsz=640)
model.export(format='tflite',int8=True)

now I am getting negative values which is true because the int8 contains -128 to + 127

this is my pre-processing to convert the image to int8 before passing.

input_size = input_details[0]['shape'][1:3]
print(input_size)
image_re = cv2.resize(image, tuple(input_size))
image_ = cv2.cvtColor(image_re, cv2.COLOR_BGR2RGB)
# Convert the image to int8 data type.
img_int8 = image_.astype(np.int8)
# If the image has values outside the int8 range (-128 to 127), you may want to
img_int8 = np.clip(img_int8, -128, 127)
input_data = np.expand_dims(img_int8, axis=0)

Now the problem is I am getting fixed negative values please suggest If I am doing something wrong.

output_data = interpreter.get_tensor(output_details[0]['index'])

    print(output_data)

The result of above line is this.

[[[-103 -103 -103 ... -103 -103 -103]
  [-103 -103 -103 ... -103 -103 -103]
  [-103 -103 -103 ... -103 -103 -103]
  ...
  [-103 -103 -103 ...   89   89   89]
  [ -95  -95  -95 ...   89   89   89]
  [-103 -103 -103 ... -103 -103 -103]]]

dobrykod commented 7 months ago

I've tried it as well but this is not going to work for the pose model. In the end, I've used this fork https://github.com/DeGirum/ultralytics_yolov8 to get quantized version. DeGirum did amazing work to fix the conversion and they published an explanation about the problem and how they did it.

glenn-jocher commented 7 months ago

@dev-techshlok hello,

Thanks for sharing your experience and solution! It's always great to see members of the community helping each other out.

Quantizing deep learning models can indeed be a tricky process. The main challenge arises from the fact that during quantization, the weights and activations of the network are converted from floating-point representation to integer representation. This can sometimes lead to precision losses, causing the network output to be negatively affected.

It's great to hear that you found a fork that has addressed these issues for the YOLOv8 pose model. It's always encouraging when members of the community can improve the framework in such a manner.

Thanks for sharing this; it will no doubt be helpful to others who are looking to quantize their YOLOv8 pose models.

Happy Coding!

dev-techshlok commented 6 months ago

Hello @dobrykod and @glenn-jocher

I tried the method given by you for Fork version, the pose model is generated for int8 but it does not work again.

I am using this colab notebook from issue no 1695.

https://colab.research.google.com/drive/1yjCEwwFuMKvFJceSDfyWrUWOSfvLlPjl?usp=sharing#scrollTo=pQ8YaJh_ejZh

Please help me get the int8 quantized model get pose estimation key point. @dobrykod if you can share your code for reference it will be helpful.

Using this to convert the models

from ultralytics import YOLO
model = YOLO("yolov8n-pose.pt")
model.export(format='tflite',
             imgsz=640,
             data="coco128.yaml",
             int8=True,
             separate_outputs=True,
             export_hw_optimized=True,
             uint8_io_dtype=True,
             max_ncalib_imgs=100
             )

Which python code can show the keypoints, Unable to get the keypoints from tensor flow interpreter. The separate output does something and changes the output. Thank you

dobrykod commented 6 months ago

I suspect you tried to call interpreter.invoke and interpreter.get_tensor in your code instead of using the fork.

I was lazy and let the function model.predict gather my outputs exactly like in the colab:

predictions = model.predict(...)
predictions[0].keypoints.cpu().numpy()

The function interpreting the raw output is postprocess in model/pose/predict.py

dev-techshlok commented 6 months ago

Hello @dobrykod

Thank you for the suggestion. can u also help if u know where are the prediction score based on which we will plot. also any code snap will be helpful.

Thank you

dobrykod commented 6 months ago

dev-techshlok commented 6 months ago

Thanks,

I can not use Yolo, so any tensor flow lite example would be great. I am passing some delegates to the interpreter. Yolo seems to use torch in its modules.

Thanks

dobrykod commented 6 months ago

Then have a look how they did it in function postprocess in model/pose/predict.py. I'm using yolo lib so I don't have any code snippet useful for you

glenn-jocher commented 6 months ago

@dev-techshlok hello,

Your approach to running TensorFlow Lite with a quantized model seems correct. However, deciphering the results is not straightforward from the Tensorflow interpreter, as YOLOv8 uses a custom output format that requires post-processing.

In YOLOv8, the post-processing of the model's raw output is handled in the postprocess function located in model/pose/predict.py. This function is responsible for interpreting the raw output, performing non-maxima suppression, and extracting keypoints.

Unfortunately, due to the custom nature of this output, it isn't something that can be easily read from the raw output of the TFLite Interpreter without replicating some of the functionality of the postprocess function.

I recommend going through the code in the postprocess function to understand how it interprets the outputs and constructs the keypoints. That will provide you some reference for what you need to do after getting the raw output from the TFLite Interpreter. It's likely that you will need to implement similar functionality in your own code to properly interpret the results.

In terms of getting the scores, they are part of the output of the TFLite Interpreter's raw output, they are usually located at a specific index depending on how the model was constructed.

Unfortunately, I cannot point you to specific code examples as instructed. Apologies for the inconvenience and I hope this was informative.

dev-techshlok commented 6 months ago

Thank you @dobrykod and @glenn-jocher

For the explanation, I will try to get the output done and will post if something gets solved.

Thank you

dev-techshlok commented 6 months ago

Hello @glenn-jocher

Is there a way around for the original Yolov8 repository that the output for tflite int8 is solved and does not give negative values?

Hello,

I am trying to convert the model to int8=True , I was able to convert the model to full quant using this method, Is this correct?

from ultralytics import YOLO

# Load a model
model = YOLO('yolov8n-pose.yaml')  # build a new model from YAML
model = YOLO('yolov8n-pose.pt')  # load a pretrained model (recommended for training)
model = YOLO('yolov8n-pose.yaml').load('yolov8n-pose.pt')  # build from YAML and transfer weights

# Train the model
results = model.train(data='coco8-pose.yaml', epochs=100, imgsz=640)
model.export(format='tflite',int8=True)

now I am getting negative values which is true because the int8 contains -128 to + 127

this is my pre-processing to convert the image to int8 before passing.

input_size = input_details[0]['shape'][1:3]
print(input_size)
image_re = cv2.resize(image, tuple(input_size))
image_ = cv2.cvtColor(image_re, cv2.COLOR_BGR2RGB)
# Convert the image to int8 data type.
img_int8 = image_.astype(np.int8)
# If the image has values outside the int8 range (-128 to 127), you may want to
img_int8 = np.clip(img_int8, -128, 127)
input_data = np.expand_dims(img_int8, axis=0)

Now the problem is I am getting fixed negative values please suggest If I am doing something wrong.

output_data = interpreter.get_tensor(output_details[0]['index'])

    print(output_data)

The result of above line is this.

[[[-103 -103 -103 ... -103 -103 -103]
  [-103 -103 -103 ... -103 -103 -103]
  [-103 -103 -103 ... -103 -103 -103]
  ...
  [-103 -103 -103 ...   89   89   89]
  [ -95  -95  -95 ...   89   89   89]
  [-103 -103 -103 ... -103 -103 -103]]]

glenn-jocher commented 6 months ago

@dev-techshlok hello,

Thank you for reaching out with your inquiry.

When working with TensorFlow Lite and converting models to the int8 format, it's quite important to ensure that your model is correctly quantized. Negative values in the output tensor could be a sign that there might be an issue with how the model was quantized or with the pre-processing steps you are applying to your input data.

For the TensorFlow Lite conversion with int8 quantization, proper calibration with representative dataset samples needs to be performed during the quantization process to ensure that the model understands the correct range and distribution of input data. This step is crucial as it defines the scaling factors used by the quantized model during inference.

Additionally, ensure that your input preprocessing in the TensorFlow Lite inference code matches the preprocessing expected by the model. This includes not only converting the image data to int8 but also scaling the input in a manner consistent with how the model was trained. Sometimes inputs are expected to be in a certain range or format, and mismatch in these can lead to unexpected output values.

Make sure to follow the specific steps for model conversion and quantization provided in the YOLOv8 documentation. Double-check that all the necessary calibration steps are taken, and that your preprocessing pipeline aligns with the model's requirements.

I hope this helps guide you in the right direction. Troubleshooting quantization issues can be complex, but ensuring proper calibration and preprocessing alignment is a good place to start. Good luck!

dev-techshlok commented 6 months ago

Hello @glenn-jocher

Thank you for replying so fast.

I am using this method to train and export the model.

from ultralytics import YOLO
model = YOLO("yolov8x-pose.pt")
model.export(format='tflite',data="coco8.yaml",int8=True,epochs=3)

Also had to switch to tensor flow version tensorflow==2.13.1

The pre-processing I am using on image is this.

 input_size = input_details[0]['shape'][1:3]
    image_re = cv2.resize(image, tuple(input_size)

    input_data = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    input_data = (input_data - 128).astype(np.int8)  # de-scale
    input_data = np.expand_dims(input_data.astype(np.uint8),axis=0)

    interpreter.set_tensor(input_details[0]['index'], input_data)

Can you verify if the post-processing is correct.

Thank you

glenn-jocher commented 6 months ago

@dev-techshlok hello,

Thank you for reaching out with your query on training and exporting your YOLOv8 model.

Your approach to training and exporting the model using the export method once you have the model instance appears to be correct. The use of int8=True implies that you are trying to export a quantized version of the model, which should be compatible with devices that support int8 computations.

As for the preprocessing steps you mentioned for your images, they seem generally in line with the process required for models expecting normalized int8 input. This normalization usually includes converting the image to the RGB color space, resizing the image to the input dimensions expected by the model, and then scaling the pixel values appropriately. It's important to ensure that these steps match exactly with the preprocessing that was done during model training for the best performance.

Regarding your post-processing, without seeing the specific details of your method, it's hard to fully verify it. Post-processing typically involves interpreting the raw output of the model, applying any necessary transformations to translate these raw outputs into understandable results (e.g., bounding box coordinates, class scores), and then filtering the results as needed (e.g., through non-maximum suppression).

Ensuring that the post-processing is correctly matched to the way that your model was trained and quantized is key. It is often helpful to look at how the original model was post-processed within the framework before exporting, and ensure that you are replicating those steps in your post-processing after inference with the TFLite model.

If you are observing discrepancies or unexpected results, it may be worth revisiting the quantization and calibration steps, or the post-training quantization configurations, to ensure that the model is correctly interpreting the int8 input data.

Wishing you success with your YOLOv8 model deployment. If you have more specific concerns or issues, please feel free to provide additional details, and we'll be happy to assist further.

Best regards.

dev-techshlok commented 6 months ago

Hello @glenn-jocher Thank you for the explanation.

This is the code I am using to test the quantized model.


import cv2
import matplotlib.pyplot as plt
import numpy as np
# import tensorflow as tf
# from ultralytics.utils.ops import scale_coords
import tflite_runtime.interpreter as tflite

KEYPOINT_EDGE_INDS_TO_COLOR = {
    (0, 1): (147, 20, 255),
    (0, 2): (255, 255, 0),
    (1, 3): (147, 20, 255),
    (2, 4): (255, 255, 0),
    (0, 5): (147, 20, 255),
    (0, 6): (255, 255, 0),
    (5, 7): (147, 20, 255),
    (7, 9): (147, 20, 255),
    (6, 8): (255, 255, 0),
    (8, 10): (255, 255, 0),
    (5, 6): (0, 255, 255),
    (5, 11): (147, 20, 255),
    (6, 12): (255, 255, 0),
    (11, 12): (0, 255, 255),
    (11, 13): (147, 20, 255),
    (13, 15): (147, 20, 255),
    (12, 14): (255, 255, 0),
    (14, 16): (255, 255, 0)
}

def draw_bbox_on_image(image, x, y, w, h):
    # Denormalize the coordinates
    x = int(x * image.shape[1])
    y = int(y * image.shape[0])
    w = int(w * image.shape[1])
    h = int(h * image.shape[0])

    # Calculate the (x1, y1) and (x2, y2) points for the rectangle
    x1, y1 = x - w // 2, y - h // 2
    x2, y2 = x + w // 2, y + h // 2

    # Draw the bounding box
    cv2.rectangle(image, (x1, y1), (x2, y2), (0, 255, 0), 2)

    return image

def map_values_to_range(value, from_range=(-127, 128), to_range=(0, 640)):
    # Check if the input value is within the source range
    if value < from_range[0] or value > from_range[1]:
        raise ValueError("Input value is outside the source range.")

    # Calculate the scaling factor
    from_min, from_max = from_range
    to_min, to_max = to_range
    scale_factor = (to_max - to_min) / (from_max - from_min)

    # Perform the mapping
    mapped_value = (value - from_min) * scale_factor + to_min

    return mapped_value

def plot_keypoints_on_image(image, keypoints):

    for edge, color in KEYPOINT_EDGE_INDS_TO_COLOR.items():
        point1_index, point2_index = edge
        x1, y1, _ = keypoints[point1_index]
        x2, y2, _ = keypoints[point2_index]
        x1, y1, x2, y2 = int(map_values_to_range(x1)), int(map_values_to_range(y1)), int(map_values_to_range(x2)), int(map_values_to_range(y2))

        # Draw the line on the image
        cv2.line(image, (x1, y1), (x2, y2), color, 2)
        cv2.circle(image, (x1, y1), 4, color, -1) 

    return image

# Load TFLite model and allocate tensors.
interpreter = tflite.Interpreter(model_path="yolov8n-pose_full_integer_quant.tflite")
interpreter.allocate_tensors()

# Get input and output tensors.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
def return_int8_image(image):
    input_data = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    input_data = (input_data - 128).astype(np.int8)  # de-scale

    # cv2.imshow("aaa",image_with_rectangle)
    # cv2.waitKey(0)
    input_data = np.expand_dims(input_data,axis=0)
    print(input_data)
    return input_data
    # input_data = input_data.transpose(2, 0, 1).reshape(1, 3, in_image_size, in_image_size)
    # input_data = np.transpose(input_data,axes=[0, 2,3,1])

# Read the image
image_path = "bus.jpg"
image = cv2.imread(image_path)

# image = cv2.convertScaleAbs(image_)

# Get the input size from the model's input details and resize the image accordingly
input_size = input_details[0]['shape'][1:3]
image_re = cv2.resize(image, tuple(input_size))

input_data2 = return_int8_image(image_re)

# Set the tensor to point to the input data to be used
interpreter.set_tensor(input_details[0]['index'], input_data2)

# Run the model
interpreter.invoke()

# Get the output tensor
output_data = interpreter.get_tensor(output_details[0]['index'])

output_data_transposed = output_data[0].T

# Select the top K bboxes
K = 50  # Change this to your desired number of bboxes
BASE = 0
sorted_indices = np.argsort(output_data_transposed[:, 4])[::-1]
top_K_by_confidence = output_data_transposed[sorted_indices[BASE:BASE+K]]

# Process each bbox
for bbox in top_K_by_confidence:
    # Select the first 51 elements and reshape it into 17x3
    keypoints = bbox[5:].reshape((17, 3))
    print(keypoints )
    xywh = bbox[:4]
    print(bbox[4])
    image_1 = plot_keypoints_on_image(image_re, keypoints)

# Save the image
# cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
cv2.imshow("windows",image_1)
cv2.waitKey(0)
# cv2.waitKey(1) & 0xFF == 27 # Press 'Esc' to exit

# cv2.imwrite('test-tflite.png', image_1)

yolov8n-pose_full_integer_quant.zip

This is the model converted.

Everything was done using google colab.

This is the output for a,y,_ = keypoint[I]


[[  28  -24 -107]
 [  28  -25 -107]
 [  26  -25 -107]
 [  29  -25 -107]
 [  25  -25 -107]
 [  31  -20 -107]
 [  23  -19 -107]
 [  33  -13 -107]
 [  23  -11 -107]
 [  32  -11 -107]
 [  26  -11 -107]
 [  31   -7 -107]
 [  25   -7 -107]
 [  32   -7 -107]
 [  27   -7 -107]
 [  32   -5 -107]
 [  28   -4 -107]]

Please let me know your thoughts on this.

Thank you.

glenn-jocher commented 6 months ago

@dev-techshlok hello,

Thank you for sharing the details of your approach to testing your quantized YOLOv8 model.

It seems that you have a structured process for visualizing the outputs of your pose estimation task. You've handled scaling and mapping the keypoint coordinates appropriately and also included bounding box interpretation and visualization.

Regarding your observed outputs, if the keypoints (signified by 'x' and 'y' variables in your snippet) seem to not correctly represent human pose structure or confidence scores ('_' variable) appear unexpectedly low or uniform, there might be several factors to consider:

Quantization: Ensure the quantization process during model conversion has not introduced significant errors or inconsistencies. Quantization can sometimes cause the model's sensitivity to small changes in numerical values to vary, affecting performance.
Model Calibration: When exporting with quantization, it’s essential to use a representative dataset for calibration. If the calibration set is not representative of the eventual use cases or data distribution, the performance might be suboptimal.
Input Normalization: Double-check the preprocessing steps are in line with what the model was trained on. Any variation in the normalization can lead to significant accuracy loss, especially in quantized models.
Post-Processing: Ensure that all transformations and scaling you apply in post-processing mirror the intended output format of the model. This includes correct interpretation of the output scale and zero points for int8 tensors.
Model Size and Complexity: As you're working with what appears to be a smaller version of YOLOv8 ('yolov8n'), consider the trade-off between model size and accuracy. Smaller models, while faster and more portable, can perform less accurately than their larger counterparts, especially on fine-grained tasks such as keypoint detection.
Debugging Output: Investigate the intermediate outputs to better understand how each stage of the post-processing affects the end result.

It might be useful to start by examining sample images where you know the expected location of keypoints and compare these with your model’s outputs. If the quantized model differs significantly from a non-quantized baseline in these controlled tests, consider revisiting the quantization and calibration process.

Lastly, as you’re working with the TFLite model in a Colab environment, ensure that all environment variables are consistent and that there were no interruptions or variations during model conversion and export.

By leveraging visualizations and a systematic approach to debugging, you should be able

dev-techshlok commented 6 months ago

Hello @glenn-jocher

Thank you for the detailed response.

Can you let me know what does this means as I am printing input details and output details of the model

Input details [{'name': 'serving_default_images:0', 'index': 0, 'shape': array([ 1, 640, 640, 3], dtype=int32), 'shape_signature': array([ 1, 640, 640, 3], dtype=int32), 'dtype': <class 'numpy.int8'>, 'quantization': (0.003921568859368563, -128), 'quantization_parameters': {'scales': array([0.00392157], dtype=float32), 'zero_points': array([-128], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}]

Output details

[{'name': 'PartitionedCall:0', 'index': 478, 'shape': array([ 1, 56, 8400], dtype=int32), 'shape_signature': array([ 1, 56, 8400], dtype=int32), 'dtype': <class 'numpy.int8'>, 'quantization': (3.0860719680786133, -107), 'quantization_parameters': {'scales': array([3.086072], dtype=float32), 'zero_points': array([-107], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}]

Can you also tell me if the [ 1, 56, 8400] what does this output is meant for yolov8.

Thank you

dobrykod commented 6 months ago

Note that if you apply your input quantization params on the image and cast to int8 you will lose all information which was in the image.

dev-techshlok commented 6 months ago

Hello @dobrykod Is there any solution for int8 type quantized models, which u are aware of. I'll try any suggested methods, fir pose estimation with quantized model. u can suggest any converted model.

Thank you

dobrykod commented 6 months ago

Maybe try one of these https://coral.ai/models/pose-estimation

dobrykod commented 6 months ago

I was using MoveNet.SinglePose.Thunder

dev-techshlok commented 6 months ago

Hello @dobrykod @glenn-jocher I used this to quantize the image


image_path = "bus.jpg"
image = cv2.imread(image_path)

# Get the input size from the model's input details and resize the image accordingly
input_size = input_details[0]['shape'][1:3]
image_re = cv2.resize(image, tuple(input_size))
# Convert the image to a NumPy array
image_array = np.array(image_re)

# Ensure data type is int8
image_array = image_array.astype(np.int32)

# Apply quantization
scale = 0.003921568859368563  # Scale factor
zero_point = -128  # Zero point
quantized_image = (image_array / scale + zero_point).astype(np.int8)
cv2.imwrite("filename_int8.jpg", quantized_image)

This is the image filename_int8

I tried this image in both movenet and yolo here are the output

1. Movenet multipose

movenet_multi

2. Yolo V8 yolo_multi

My script attached above gives this output. Seems it is close but not accurate can you look into this? The right bottom seems like estimation.

output

Also what does it mean, regarding index, will there be output on index 478

[{'name': 'PartitionedCall:0', 'index': 478, 'shape': array([ 1, 56, 8400], dtype=int32), 'shape_signature': array([ 1, 56, 8400], dtype=int32), 'dtype': <class 'numpy.int8'>, 'quantization': (3.0860719680786133, -107), 'quantization_parameters': {'scales': array([3.086072], dtype=float32), 'zero_points': array([-107], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}]

dobrykod commented 6 months ago

I am surprised to see so much after quantization. I though np.astype is cliping it to -128:127 but is seems it is doing modulo instead.

Anyway for movenet the quantization params are different than [0.0039, -128]. You always take them from input_details = interpreter.get_input_details()[0] scale, zero_point = input_details["quantization"]

And the output is in your yolo model at output_details['index'] = 478 output_data = interpreter.get_tensor(output_details['index'])

glenn-jocher commented 6 months ago

@dev-techshlok hello,

Thank you for reaching out with your concerns regarding the quantization process for the YOLOv8 model.

It's important to remember that quantization can indeed affect the range and representation of your data. When quantizing images, make sure you're using the correct scale and zero-point parameters directly obtained from the model's input details. This ensures that the scaling and offset are correctly applied, preserving the distribution of your initial data as much as possible within the constraints of int8 representation.

Regarding your YOLOv8 model's output, the 'index' field specifies where in the TensorFlow Lite interpreter's memory the output tensor is located. To read the model's inference results, you will need to access the tensor at the given index, in your case index 478. By using the get_tensor method with this index, you should be able to retrieve the output data and proceed with any post-processing necessary for interpreting the model's predictions.

If you have further questions or need additional support, we encourage you to continue exploring the documentation or reach out again.

Best regards.

dev-techshlok commented 6 months ago

@glenn-jocher It may sound a bit harsh, but in the open-source community, we are trying to solve something here. If you could kindly refrain from posting responses generated by GPT, we'd greatly appreciate it.

Thank you for understanding.

glenn-jocher commented 6 months ago

@dobrykod hello there,

Thank you for reaching out with your concerns. I absolutely appreciate your dedication to the open-source community and the efforts in improving the YOLOv8 project.

In response to your GitHub issue, I want to ensure that any discussion remains constructive and focused on the specifics of YOLOv8. If there are particular aspects of the model or its implementation that you need clarity on or assistance with, I'm here to provide you with guidance and support based on my knowledge and the resources available in the repository.

The collaborative nature of open-source development is indeed vital, and I'm committed to upholding that spirit by offering support that aligns closely with the community's goals and the technical intricacies of YOLOv8.

If you have questions about the model's architecture, training process, inference speed, or best practices for deployment, feel free to share those, and I will do my best to assist you from there.

Looking forward to your specific questions and working together towards advancing the capabilities of YOLOv8.

Best regards.

Evansika959 commented 4 months ago

@dev-techshlok Hello!

Lately, I've been tried to use the quantized model on int8 machines, but was stuck at the same problem like yours. Have you succeeded in doing this? I'd really appreciate it if you could kindly share your recent work or we could figure something out together?

Best regards

glenn-jocher commented 4 months ago

@Evansika959 hello,

I'm glad to hear you're exploring the quantized models. For int8 quantization, it's crucial to use the correct scale and zero-point from the model's input details for accurate preprocessing. For post-processing, ensure you're applying the inverse transformation to interpret the outputs correctly.

If you're still facing issues, I recommend reviewing the documentation on model quantization and inference. Collaborating on this could be beneficial, and I encourage you to share your findings or specific challenges in the discussions section for community input.

Best regards.

ultralytics / ultralytics