Closed Vikram12301 closed 1 year ago
π Hello @Vikram12301, thank you for your interest in YOLOv5 π! Please visit our βοΈ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.
If this is a π Bug Report, please provide a minimum reproducible example to help us debug it.
If this is a custom training β Question, please provide as much information as possible, including dataset image examples and training logs, and verify you are following our Tips for Best Training Results.
Python>=3.7.0 with all requirements.txt installed including PyTorch>=1.7. To get started:
git clone https://github.com/ultralytics/yolov5 # clone
cd yolov5
pip install -r requirements.txt # install
YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):
If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training, validation, inference, export and benchmarks on MacOS, Windows, and Ubuntu every 24 hours and on every commit.
We're excited to announce the launch of our latest state-of-the-art (SOTA) object detection model for 2023 - YOLOv8 π!
Designed to be fast, accurate, and easy to use, YOLOv8 is an ideal choice for a wide range of object detection, image segmentation and image classification tasks. With YOLOv8, you'll be able to quickly and accurately detect objects in real-time, streamline your workflows, and achieve new levels of accuracy in your projects.
Check out our YOLOv8 Docs for details and get started with:
pip install ultralytics
@Vikram12301 based on the code you've provided, it appears that you are trying to run inference on a TensorFlow Lite model using the tf.lite.Interpreter
API. However, you may be getting unexpected results as the output you've shared is a 3-Dimensional numpy array, whereas the expected output is a 2-Dimensional PyTorch tensor.
It is recommended to check if your TensorFlow Lite model was exported with the same input and output dimensions and data types as the PyTorch model you're comparing it to. Additionally, you could use the interpreter.get_output_details()
API to check the dimensions and data types of the output tensor.
If you still face issues, you can share more details such as the TensorFlow Lite model file, and we will try our best to help you.
@glenn-jocher The output for 'interpreter.get_output_details()' is
[{'name': 'StatefulPartitionedCall:0', 'index': 532, 'shape': array([ 1, 25200, 6], dtype=int32), 'shape_signature': array([ 1, 25200, 6], dtype=int32), 'dtype': numpy.float32, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}]
But the output for the below code
import torch
import tensorflow as tf
from PIL import Image
model = torch.hub.load('ultralytics/yolov5', 'custom','best-fp16.tflite' )
interpreter = tf.lite.Interpreter(model_path="best-fp16.tflite")
input_details = interpreter.get_input_details()
image = Image.open('image.tif')
image = image.resize((input_details[0]['shape'][1], input_details[0]['shape'][2]))
results = model(image)
# Results
results.print()
print(results.xyxy[0])
Speed: 8.6ms pre-process, 577.2ms inference, 2.8ms NMS per image at shape (1, 3, 640, 640) tensor([[255.95584, 423.64117, 344.48886, 433.97974, 0.90273, 0.00000]])
The same .tflite is working when using torch.hub and not working with tf.lite.Interpreter
@Vikram12301 it seems like the difference in output between your TensorFlow Lite model and the PyTorch model could be caused by differences in the graphic libraries used between your two models (especially if they use different backends).
To further investigate this, I recommend checking the output shape, executing the inference with model.forward()
API instead of model()
and modifying the dimension of the output tensor to reflect those dimensions of the output tensor from the PyTorch model output tensor.
Here is an example of how you can get the output tensor shape for the same image image_data
using the model.forward()
API:
input_tensor = torch.from_numpy(image_data)
output_tensor = model.forward(input_tensor)[0]
print(output_tensor.shape)
Please let me know if you have any additional questions.
@glenn-jocher I followed the above-mentioned steps(I am using Google Colab for all the experiments)
image_data = Image.open('image.tif')
image_data = image_data.resize((640,640))
image_data = np.transpose(image_data, (2, 1, 0))
image_data = np.array(image_data)
image_data = np.expand_dims(image_data, axis=0)
input_tensor = torch.from_numpy(image_data)
output_tensor = model.forward(input_tensor)[0]
print(output_tensor.shape)
I got this as the output: torch.Size([25200, 6])
So how do I rectify this to get the right output with tflite using tf.lite.Interpreter
@Vikram12301 Great! It looks like the shape of the output tensor from the PyTorch model is [25200, 6]
. Now, to modify the output tensor from the TensorFlow Lite model to have the same shape, we can manipulate its shape using numpy.
Here's some sample code that you can substitute in your notebook to transform the output of your TensorFlow Lite model:
# Load your TensorFlow Lite model, perform the inference, and get the output tensor
interpreter = tf.lite.Interpreter(model_path="best-fp16.tflite")
interpreter.allocate_tensors()
# Get the input and output shapes
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
# Preprocess the image and run inference
image_data = Image.open('image.tif')
image_data = image_data.resize((640,640))
image_data = np.array(image_data).astype(np.float32)
image_data = np.expand_dims(image_data, axis=0)
interpreter.set_tensor(input_details[0]['index'], image_data)
interpreter.invoke()
output_data = interpreter.get_tensor(output_details[0]['index'])
# Reshape the output to match the PyTorch model
output_data = output_data.reshape(-1, 6)
# Print the output shape
print(output_data.shape)
I hope this helps. Let me know if you have any other questions!
@glenn-jocher I actually want the output of model(image) and not the output of model.forward(image), The expected output is (xmin,xmax,ymin,ymax,conf,class) as in
@Vikram12301 If you want to get the output of model(image)
rather than model.forward(image)
, you can convert the output tensor shape to match the expected output shape of (xmin, ymin, xmax, ymax, confidence, class). Here's how:
# Load your TensorFlow Lite model, perform the inference, and get the output tensor
interpreter = tf.lite.Interpreter(model_path="best-fp16.tflite")
interpreter.allocate_tensors()
# Get the input and output shapes
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
# Preprocess the image and run inference
image_data = Image.open('image.tif')
image_data = image_data.resize((640,640))
image_data = np.array(image_data).astype(np.float32)
image_data = np.expand_dims(image_data, axis=0)
interpreter.set_tensor(input_details[0]['index'], image_data)
interpreter.invoke()
output_data = interpreter.get_tensor(output_details[0]['index'])
# Reshape the output to match the PyTorch model
output_data = output_data.reshape(-1, 6)
# Convert the coordinates to (xmin, ymin, xmax, ymax) format
output_data[..., :4] = output_data[..., :4] * 640
output_data = output_data[..., [1, 2, 3, 4, 0, 5]]
# Print the output shape and data
print(output_data.shape)
print(output_data)
This should give you the expected output shape of (1, 25200, 6)
and the required format of (xmin, ymin, xmax, ymax, confidence, class)
.
But why do I get an output of shape (1, 25200, 6), when model(image) gives me a tensor like this tensor([[5.01359e+02, 1.09255e+03, 6.92128e+02, 1.11753e+03, 9.24892e-01, 0.00000e+00]])
What does 25200 rows contain, in the above result and how to remove the rows which are not required to get a single row output like above??
Also the output for the code which you mentioned is giving the below result
[[-9.8392916e-01 6.0772041e+01 1.3479447e+01 9.9844486e-01 9.0962954e+00 9.9998873e-01] [-3.0064437e+00 1.5501477e+02 1.9930803e+01 1.0000000e+00 1.1996713e+01 9.9999988e-01] [ 1.5339977e+00 2.4892773e+01 5.6163692e+00 1.0000000e+00 1.1995931e+01 9.9997205e-01] ... [ 6.0863861e+02 2.1737537e+01 1.3993654e+02 2.9375503e-02 6.2018915e+02 9.9997765e-01] [ 6.1352100e+02 2.0041382e+01 1.3568088e+02 3.0493064e-02 6.2263947e+02 9.9998361e-01] [ 6.2180359e+02 1.9494127e+02 2.8580172e+01 1.4891188e-01 6.2383502e+02 9.9998158e-01]]
How can class be floating? and how do I remove the other rows and get a clean output as I get in model(image)??
@Vikram12301, I apologize for the confusion earlier. If your expected output has a single row like tensor([[5.01359e+02, 1.09255e+03, 6.92128e+02, 1.11753e+03, 9.24892e-01, 0.00000e+00]])
, then you can use the following code to get a similar output as model(image)
:
# Load your TensorFlow Lite model, perform the inference, and get the output tensor
interpreter = tf.lite.Interpreter(model_path="best-fp16.tflite")
interpreter.allocate_tensors()
# Get the input and output shapes
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
# Preprocess the image and run inference
image_data = Image.open('image.tif')
image_data = image_data.resize((640,640))
image_data = np.array(image_data).astype(np.float32)
image_data = np.expand_dims(image_data, axis=0)
interpreter.set_tensor(input_details[0]['index'], image_data)
interpreter.invoke()
output_data = interpreter.get_tensor(output_details[0]['index'])
# Reshape the output to contain only the detected objects
output_data = output_data[0]
# Convert the coordinates to (xmin, ymin, xmax, ymax) format
output_data[..., 0] = output_data[..., 0] * 640
output_data[..., 1] = output_data[..., 1] * 640
output_data[..., 2] = output_data[..., 2] * 640
output_data[..., 3] = output_data[..., 3] * 640
# Convert the class probabilities to class IDs
output_data[..., 5] = np.argmax(output_data[..., 5], axis=-1)
# Round the confidence score to 2 decimal places
output_data[..., 4] = np.round(output_data[..., 4], decimals=2)
# Print the output shape and data
print(output_data)
This code will give you an output of the form array([[5.01359375e+02, 1.09254980e+03, 6.92128357e+02, 1.11752808e+03, 9.24891961e-01, 0.00000000e+00]])
, which is similar to
The above code gave me an output [[ 9.0962954e+00 -9.8392916e-01 6.0772041e+01 1.3479447e+01 1.0000000e+00 3.0000000e+00] [ 1.1996713e+01 -3.0064437e+00 1.5501477e+02 1.9930803e+01 1.0000000e+00 3.0000000e+00] [ 1.1995931e+01 1.5339977e+00 2.4892773e+01 5.6163692e+00 1.0000000e+00 3.0000000e+00] ... [ 6.2018915e+02 6.0863861e+02 2.1737537e+01 1.3993654e+02 2.9999999e-02 3.0000000e+00] [ 6.2263947e+02 6.1352100e+02 2.0041382e+01 1.3568088e+02 2.9999999e-02 3.0000000e+00] [ 6.2383502e+02 6.2180359e+02 1.9494127e+02 2.8580172e+01 1.5000001e-01 3.0000000e+00]]
and not array([[5.01359375e+02, 1.09254980e+03, 6.92128357e+02, 1.11752808e+03, 9.24891961e-01, 0.00000000e+00]])
Can you please check from your side?
@glenn-jocher Also, there are some rows, which has class = 3 in the above output, The trained model does not have those many classes, So there is something which is completely being missed
@glenn-jocher Also, in this issue, you have mentioned not to use tensors during inference, That can be managed in torch.hub by passing the cv2 or PIL image, But how can this be handled while using tflite, Because, in that case we have to use a tensor
@Vikram12301 I apologize for the confusion earlier. I made an error in my previous response. The expected output format of (xmin, ymin, xmax, ymax, confidence, class)
does not match the output format of the TensorFlow Lite model, which has a shape of (1, 25200, 6)
. The output format of the TensorFlow Lite model is (ymin, xmin, ymax, xmax, confidence, class)
. To convert this format to the desired format, we can modify the code as follows:
# Load your TensorFlow Lite model, perform the inference, and get the output tensor
interpreter = tf.lite.Interpreter(model_path="best-fp16.tflite")
interpreter.allocate_tensors()
# Get the input and output shapes
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
# Preprocess the image and run inference
image_data = Image.open('image.tif')
image_data = image_data.resize((640,640))
image_data = np.array(image_data).astype(np.float32)
image_data = np.expand_dims(image_data, axis=0)
interpreter.set_tensor(input_details[0]['index'], image_data)
interpreter.invoke()
output_data = interpreter.get_tensor(output_details[0]['index'])
# Reshape the output to contain only the detected objects
output_data = output_data[0]
# Convert the coordinates to (xmin, ymin, xmax, ymax) format
output_data = output_data[..., [1, 0, 3, 2]]
output_data[..., 0:4] = output_data[..., 0:4] * 640
output_data[..., 5] = np.argmax(output_data[..., 5], axis=-1)
# Round the confidence score to 2 decimal places
output_data[..., 4] = np.round(output_data[..., 4], decimals=2)
# Print the output shape and data
print(output_data)
To handle the use of tensors during inference, you can preprocess the image and run inference on the tensor, then post-process the output to get the desired format.
Regarding the issue of unexpected classes in the output, you are correct that this indicates that there is something unexpected happening during inference. I suggest you take a closer look at the trained model and the data to determine why this is happening.
How to apply post-processing to the output? Is there some simple way to do it?
@Vikram12301 post-processing is the process of refining the output of the model to obtain the final detections. Here are some common post-processing steps used for object detection:
In your case, since you are using a TensorFlow Lite model, you can do the post-processing using NMS and filtering in TensorFlow Lite. Here's how you can do it:
# Load your TensorFlow Lite model, perform the inference, and get the output tensor
interpreter = tf.lite.Interpreter(model_path="best-fp16.tflite")
interpreter.allocate_tensors()
# Get the input and output shapes
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
# Preprocess the image and run inference
image_data = Image.open('image.tif')
image_data = image_data.resize((640,640))
image_data = np.array(image_data).astype(np.float32)
image_data = np.expand_dims(image_data, axis=0)
interpreter.set_tensor(input_details[0]['index'], image_data)
interpreter.invoke()
# Get the raw output tensor
output_data = interpreter.get_tensor(output_details[0]['index'])
# Apply non-maximum suppression (NMS)
detections = tf.image.combined_non_max_suppression(
boxes=tf.expand_dims(output_data[..., :4], axis=2),
scores=output_data[..., 4:5],
max_output_size_per_class=100,
max_total_size=100,
iou_threshold=0.45,
score_threshold=0.25
)
# Filter out detections with low confidence scores
boxes = detections.nmsed_boxes[detections.nmsed_scores[:, 0] > 0.25]
scores = detections.nmsed_scores[:, 0][detections.nmsed_scores[:, 0] > 0.25]
classes = detections.nmsed_classes[detections.nmsed_scores[:, 0] > 0.25]
# Rescale the bounding box coordinates to the size of the original image
boxes = boxes * 640
# Print the final detections
print(boxes)
print(scores)
print(classes)
This code applies NMS with an intersection over union (IOU) threshold of 0.45 and a confidence score threshold of
@glenn-jocher Assuming that the above code works, yet there are multiple integers for the class which the object belongs to, but the model which I trained, has only one target class. So how to identify the classes which do not belong to my target class?
π Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.
For additional resources and information, please see the links below:
Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!
Thank you for your contributions to YOLO π and Vision AI β
@Vikram12301 post-processing is the process of refining the output of the model to obtain the final detections. Here are some common post-processing steps used for object detection:
- Filtering: Remove detections with low confidence scores.
- Non-maximum suppression (NMS): Remove duplicate detections of the same object.
- Scaling: Rescale the bounding box coordinates to the size of the original image.
In your case, since you are using a TensorFlow Lite model, you can do the post-processing using NMS and filtering in TensorFlow Lite. Here's how you can do it:
# Load your TensorFlow Lite model, perform the inference, and get the output tensor interpreter = tf.lite.Interpreter(model_path="best-fp16.tflite") interpreter.allocate_tensors() # Get the input and output shapes input_details = interpreter.get_input_details() output_details = interpreter.get_output_details() # Preprocess the image and run inference image_data = Image.open('image.tif') image_data = image_data.resize((640,640)) image_data = np.array(image_data).astype(np.float32) image_data = np.expand_dims(image_data, axis=0) interpreter.set_tensor(input_details[0]['index'], image_data) interpreter.invoke() # Get the raw output tensor output_data = interpreter.get_tensor(output_details[0]['index']) # Apply non-maximum suppression (NMS) detections = tf.image.combined_non_max_suppression( boxes=tf.expand_dims(output_data[..., :4], axis=2), scores=output_data[..., 4:5], max_output_size_per_class=100, max_total_size=100, iou_threshold=0.45, score_threshold=0.25 ) # Filter out detections with low confidence scores boxes = detections.nmsed_boxes[detections.nmsed_scores[:, 0] > 0.25] scores = detections.nmsed_scores[:, 0][detections.nmsed_scores[:, 0] > 0.25] classes = detections.nmsed_classes[detections.nmsed_scores[:, 0] > 0.25] # Rescale the bounding box coordinates to the size of the original image boxes = boxes * 640 # Print the final detections print(boxes)draw print(scores) print(classes)
This code applies NMS with an intersection over union (IOU) threshold of 0.45 and a confidence score threshold of
I converted yolov8n to tflite. when i draw bounding box. Its not good. My code to draw bounding box
import numpy as np
import tensorflow as tf
from PIL import Image
import cv2
import random
def plot_one_box(x, img, color=None, label=None, line_thickness=3):
# Plots one bounding box on image img
tl = line_thickness or round(0.002 * (img.shape[0] + img.shape[1]) / 2) + 1 # line/font thickness
color = color or [random.randint(0, 255) for _ in range(3)]
c1, c2 = (int(x[0]), int(x[1])), (int(x[2]), int(x[3]))
cv2.rectangle(img, c1, c2, color, thickness=tl, lineType=cv2.LINE_AA)
if label:
tf = max(tl - 1, 1) # font thickness
t_size = cv2.getTextSize(label, 0, fontScale=tl / 3, thickness=tf)[0]
c2 = c1[0] + t_size[0], c1[1] - t_size[1] - 3
cv2.rectangle(img, c1, c2, color, -1, cv2.LINE_AA) # filled
cv2.putText(img, label, (c1[0], c1[1] - 2), 0, tl / 3, [225, 255, 255], thickness=tf, lineType=cv2.LINE_AA)
# Load your TensorFlow Lite model, perform the inference, and get the output tensor
interpreter = tf.lite.Interpreter(model_path="yolov8n_saved_model/yolov8n_float32.tflite")
interpreter.allocate_tensors()
# Get the input and output shapes
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
# Preprocess the image and run inference
img = cv2.imread('bus.jpg')
h, w, _ = img.shape
image_resize = cv2.resize(img, (640,640))
image_data = cv2.cvtColor(image_resize, cv2.COLOR_BGR2RGB)
image_data = np.array(image_data).astype(np.float32)
image_data = np.expand_dims(image_data, axis=0)
interpreter.set_tensor(input_details[0]['index'], image_data)
interpreter.invoke()
output_data = interpreter.get_tensor(output_details[0]['index'])
# Apply non-maximum suppression (NMS)
detections = tf.image.combined_non_max_suppression(
boxes=tf.expand_dims(output_data[..., :4], axis=2),
scores=output_data[..., 4:5],
max_output_size_per_class=100,
max_total_size=100,
iou_threshold=0.45,
score_threshold=0.25
)
# Filter out detections with low confidence scores
boxes = detections.nmsed_boxes[detections.nmsed_scores[:, 0] > 0.25]
scores = detections.nmsed_scores[:, 0][detections.nmsed_scores[:, 0] > 0.25]
classes = detections.nmsed_classes[detections.nmsed_scores[:, 0] > 0.25]
# Rescale the bounding box coordinates to the size of the original image
# boxes = boxes * 640
# Print the final detections
# print(boxes)
# print(scores)
# print(classes)
for bbox in boxes[0]:
xmin, ymin, xmax, ymax = bbox[0], bbox[1], bbox[2], bbox[3]
xmin, ymin, xmax, ymax = int(xmin*w), int(ymin*h), int(xmax*w), int(ymax*h)
plot_one_box([xmin, ymin, xmax, ymax], img, [0, 255, 0], 'Label')
cv2.imshow('img', img)
cv2.waitKey(0)
output:
Can you help, how to solve this ?
@naseemap47 hi! Based on the output image you provided, it looks like the bounding boxes are not correctly aligned with the objects in the image. Here are a few things you can try:
Check the input image size: Make sure you are using the correct image size for inference. If the input image size is different from the image size used during training, it can affect the accuracy of the detections.
Adjust the confidence score threshold: Try adjusting the confidence score threshold to include more or less detections. You can experiment with different values to see what works best for your use case.
Check the anchor boxes: If you are using custom anchor boxes for your model, make sure they are appropriate for the objects you are detecting. You may need to adjust the anchor box sizes or ratios to get better results.
Check the NMS parameters: Experiment with different values for the NMS parameters, such as the score threshold and the IOU threshold. These can affect the number and quality of the output detections.
I hope these suggestions help! Good luck with your project.
heres a full example for a model that has 2 classes sorry the github interface requires a phd to use......
`import cv2 import numpy as np import tensorflow.lite as tflite import torch import torchvision import time
model_path = "/Users/brett/Desktop/last-fp16.tflite" img_path = '/Users/brett/Desktop/istockphoto-884279742-612x612.jpg'
def save_tensor_as_numpy(tensor, filename): """ This function saves a PyTorch tensor as a NumPy array to a file.
Parameters:
tensor (torch.Tensor): The PyTorch tensor to be saved.
filename (str): The path of the file where to save the tensor.
"""
# Convert the PyTorch tensor to a NumPy array
numpy_array = tensor.detach().cpu().numpy()
# Save the NumPy array to a file
np.save(filename, numpy_array)
def non_max_suppression( prediction, conf_thres=0.25, iou_thres=0.45, classes=None, agnostic=False, multi_label=False, labels=(), max_det=1000, nm=0, # number of masks ): """Non-Maximum Suppression (NMS) on inference results to reject overlapping detections
Returns:
list of detections, on (n,6) tensor per image [xyxy, conf, cls]
"""
# Checks
assert 0 <= conf_thres <= 1, f'Invalid Confidence threshold {conf_thres}, valid values are between 0.0 and 1.0'
assert 0 <= iou_thres <= 1, f'Invalid IoU {iou_thres}, valid values are between 0.0 and 1.0'
if isinstance(prediction, (list, tuple)): # YOLOv5 model in validation model, output = (inference_out, loss_out)
prediction = prediction[0] # select only inference output
device = prediction.device
mps = 'mps' in device.type # Apple MPS
if mps: # MPS not fully supported yet, convert tensors to CPU before NMS
prediction = prediction.cpu()
bs = prediction.shape[0] # batch size
# nc = prediction.shape[2] - nm - 5 # number of classes
nc = prediction.shape[2] - nm - 5 # number of classes
xc = prediction[..., 4] > conf_thres # candidates
# Settings
# min_wh = 2 # (pixels) minimum box width and height
max_wh = 7680 # (pixels) maximum box width and height
max_nms = 30000 # maximum number of boxes into torchvision.ops.nms()
time_limit = 1.5 + 0.05 * bs # seconds to quit after
redundant = True # require redundant detections
multi_label &= nc > 1 # multiple labels per box (adds 0.5ms/img)
merge = False # use merge-NMS
t = time.time()
mi = 5 + nc # mask start index
output = [torch.zeros((0, 6 + nm), device=prediction.device)] * bs
for xi, x in enumerate(prediction): # image index, image inference
# Apply constraints
# x[((x[..., 2:4] < min_wh) | (x[..., 2:4] > max_wh)).any(1), 4] = 0 # width-height
x = x[xc[xi]] # confidence
# Cat apriori labels if autolabelling
if labels and len(labels[xi]):
lb = labels[xi]
v = torch.zeros((len(lb), nc + nm + 5), device=x.device)
v[:, :4] = lb[:, 1:5] # box
v[:, 4] = 1.0 # conf
v[range(len(lb)), lb[:, 0].long() + 5] = 1.0 # cls
x = torch.cat((x, v), 0)
# If none remain process next image
if not x.shape[0]:
continue
# Compute conf
x[:, 5:] *= x[:, 4:5] # conf = obj_conf * cls_conf
# Box/Mask
box = xywh2xyxy(x[:, :4]) # center_x, center_y, width, height) to (x1, y1, x2, y2)
mask = x[:, mi:] # zero columns if no masks
# Detections matrix nx6 (xyxy, conf, cls)
if multi_label:
i, j = (x[:, 5:mi] > conf_thres).nonzero(as_tuple=False).T
x = torch.cat((box[i], x[i, 5 + j, None], j[:, None].float(), mask[i]), 1)
else: # best class only
conf, j = x[:, 5:mi].max(1, keepdim=True)
x = torch.cat((box, conf, j.float(), mask), 1)[conf.view(-1) > conf_thres]
# Filter by class
if classes is not None:
x = x[(x[:, 5:6] == torch.tensor(classes, device=x.device)).any(1)]
# Apply finite constraint
# if not torch.isfinite(x).all():
# x = x[torch.isfinite(x).all(1)]
# Check shape
n = x.shape[0] # number of boxes
if not n: # no boxes
continue
x = x[x[:, 4].argsort(descending=True)[:max_nms]] # sort by confidence and remove excess boxes
# Batched NMS
c = x[:, 5:6] * (0 if agnostic else max_wh) # classes
boxes, scores = x[:, :4] + c, x[:, 4] # boxes (offset by class), scores
i = torchvision.ops.nms(boxes, scores, iou_thres) # NMS
i = i[:max_det] # limit detections
if merge and (1 < n < 3E3): # Merge NMS (boxes merged using weighted mean)
# update boxes as boxes(i,4) = weights(i,n) * boxes(n,4)
iou = box_iou(boxes[i], boxes) > iou_thres # iou matrix
weights = iou * scores[None] # box weights
x[i, :4] = torch.mm(weights, x[:, :4]).float() / weights.sum(1, keepdim=True) # merged boxes
if redundant:
i = i[iou.sum(1) > 1] # require redundancy
output[xi] = x[i]
if mps:
output[xi] = output[xi].to(device)
if (time.time() - t) > time_limit:
break # time limit exceeded
return output
def from_numpy(x): return torch.from_numpy(x).to('cpu') if isinstance(x, np.ndarray) else x
def xywh2xyxy(x): """Convert nx4 boxes from [x, y, w, h] to [x1, y1, x2, y2] where xy1=top-left, xy2=bottom-right""" y = x.clone() if isinstance(x, torch.Tensor) else np.copy(x) y[..., 0] = x[..., 0] - x[..., 2] / 2 # top left x y[..., 1] = x[..., 1] - x[..., 3] / 2 # top left y y[..., 2] = x[..., 0] + x[..., 2] / 2 # bottom right x y[..., 3] = x[..., 1] + x[..., 3] / 2 # bottom right y return y
def letterbox(im, new_shape=(640, 640), color=(114, 114, 114), auto=True, scaleFill=False, scaleup=True, stride=32):
shape = im.shape[:2] # current shape [height, width]
if isinstance(new_shape, int):
new_shape = (new_shape, new_shape)
# Scale ratio (new / old)
r = min(new_shape[0] / shape[0], new_shape[1] / shape[1])
if not scaleup: # only scale down, do not scale up (for better val mAP)
r = min(r, 1.0)
# Compute padding
ratio = r, r # width, height ratios
new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r))
dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1] # wh padding
if auto: # minimum rectangle
dw, dh = np.mod(dw, stride), np.mod(dh, stride) # wh padding
elif scaleFill: # stretch
dw, dh = 0.0, 0.0
new_unpad = (new_shape[1], new_shape[0])
ratio = new_shape[1] / shape[1], new_shape[0] / shape[0] # width, height ratios
dw /= 2 # divide padding into 2 sides
dh /= 2
if shape[::-1] != new_unpad: # resize
im = cv2.resize(im, new_unpad, interpolation=cv2.INTER_LINEAR)
top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1))
left, right = int(round(dw - 0.1)), int(round(dw + 0.1))
im = cv2.copyMakeBorder(im, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color) # add border
return im, ratio, (dw, dh)
def box_iou(box1, box2, eps=1e-7): """ Return intersection-over-union (Jaccard index) of boxes. Both sets of boxes are expected to be in (x1, y1, x2, y2) format. Arguments: box1 (Tensor[N, 4]) box2 (Tensor[M, 4]) Returns: iou (Tensor[N, M]): the NxM matrix containing the pairwise IoU values for every element in boxes1 and boxes2 """ (a1, a2), (b1, b2) = box1.unsqueeze(1).chunk(2, 2), box2.unsqueeze(0).chunk(2, 2) inter = (torch.min(a2, b2) - torch.max(a1, b1)).clamp(0).prod(2) return inter / ((a2 - a1).prod(2) + (b2 - b1).prod(2) - inter + eps)
def fwd(interpreter, im):
output_details, input_details = interpreter.get_output_details(), interpreter.get_input_details()
input = input_details[0]
int8 = input['dtype'] == np.uint8 # is TFLite quantized uint8 model
if int8:
scale, zero_point = input['quantization']
im = (im / scale + zero_point).astype(np.uint8) # de-scale
interpreter.set_tensor(input['index'], im)
interpreter.invoke()
y = []
for output in output_details:
x = interpreter.get_tensor(output['index'])
if int8:
scale, zero_point = output['quantization']
x = (x.astype(np.float32) - zero_point) * scale # re-scale
y.append(x)
y = [x if isinstance(x, np.ndarray) else x.numpy() for x in y]
y[0][..., :4] *= [640,640,640,640] # xywh normalized to pixels
if isinstance(y, (list, tuple)):
return from_numpy(y[0]) if len(y) == 1 else [from_numpy(x) for x in y]
else:
return from_numpy(y)
return y
interpreter = tflite.Interpreter(model_path=model_path) interpreter.allocate_tensors()
input_details = interpreter.get_input_details() output_details = interpreter.get_output_details()
image = cv2.imread(img_path)
imageresized, , _ = letterbox(image, new_shape=640, auto=False, stride=32) image_resized = image_resized.astype(np.float32) image_resized /= 255
input_shape = input_details[0]['shape'] input_data = np.array(np.expand_dims(image_resized, 0), dtype=np.float32)
interpreter.set_tensor(input_details[0]['index'], input_data)
output_data = fwd(interpreter, input_data) predictions = output_data
nms_preds = non_max_suppression(predictions, max_det=1000)
print(nms_preds)
classes = ['ball', 'hoop'] for pred in nms_preds: print(pred) for *xyxy, conf, cls in reversed(pred): if conf > 0.2: # confidence threshold
x1, y1, x2, y2 = int(xyxy[0]), int(xyxy[1]), int(xyxy[2]), int(xyxy[3])
# Plot the bounding box
cv2.rectangle(image_resized, (x1, y1), (x2, y2), (0, 255, 0), 2)
# Get the name of the class
class_name = classes[int(cls)]
print(class_name)
# Add a label for the class
cv2.putText(image_resized, f'{class_name}', (x1, y1), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 255, 255), 2)
cv2.imshow('image', image_resized) cv2.waitKey(0) cv2.destroyAllWindows()
`
@bdytx5 it seems that you are implementing custom post-processing and visualization of object detection results for a TensorFlow Lite model. The code you provided performs the following steps:
The code is utilizing PyTorch and OpenCV for custom NMS and visualization, which is not the standard practice for TensorFlow Lite models.
Your implementation combines elements from different frameworks (TFLite, PyTorch, OpenCV) for inference, post-processing, and visualization. It's important to ensure compatibility and consistency across these different components. Double-check that the bounding box coordinates and the class indices align correctly with the model's output format.
If you encounter issues with the bounding box visualization, I would suggest verifying that the detected bounding box coordinates are correctly transformed and mapped onto the original image.
Also, make sure that the class indices align with your expected classes. Verify that the classes are correctly mapped to the class names and that the output classes correspond to the classes expected from the trained model.
Finally, consider using the TFLite Interpreter's built-in post-processing and visualization methods, if available, for a more streamlined and consistent workflow.
If you encounter specific issues or have further questions, feel free to provide additional details, and I'd be happy to assist further.
Search before asking
Question
Using the above code for inference. But getting a random array as output. But I am getting the expected answer as output when inferencing with torch.hub
Expected result: tensor([[5.01359e+02, 1.09255e+03, 6.92128e+02, 1.11753e+03, 9.24892e-01, 0.00000e+00]])
Result I got for the above code: [[[ 0.0053059 0.004525 0.014499 0.011126 3.7081e-05 0.99995] [ 0.0052993 0.0029876 0.021002 0.013376 4.0981e-05 0.99999] [ 0.0054641 0.005734 0.0095944 0.034277 8.8067e-06 0.99998] ... [ 0.97123 0.96511 0.026191 0.075059 2.2332e-07 0.99999] [ 0.96813 0.95563 0.0322 0.20285 2.2449e-07 0.99999] [ 0.94753 0.98328 0.17076 0.02539 2.7026e-06 0.99998]]]
Additional
No response