ultralytics / yolov5

YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
49.78k stars 16.12k forks source link

Syntax and understanding questions about reading tensorflow lite results #13264

Closed mwickersheim closed 1 month ago

mwickersheim commented 1 month ago

Search before asking

Question

While researching how to understand the results and syntax of reading a converted yolov5s model to Tensorflow Lite. I found the following statements and ran across a syntax that I don't understand.

    # output only first tensor [1,6300,85] = [xywh, conf, class0, class1, ...]
    # x = x[0][0]  # [x(1,6300,85), ...] to x(6300,85)
    # xywh = x[..., :4]  # x(6300,4) boxes
    # conf = x[..., 4:5]  # x(6300,1) confidences
    # cls = tf.reshape(tf.cast(tf.argmax(x[..., 5:], axis=1), tf.float32), (-1, 1))  # x(6300,1)  classes
    # return tf.concat([conf, cls, xywh], 1)

What does the ... in x[..., :4] mean?

For my use case. I'm running a tensorflow lite model in a vendor's SDK. When inspecting the inference results, I get the following shape (1, 25200, 6) from result['StatefulPartitionedCall:0'].shape. Do the results really have 25200 good detections?

The first sample is [ 1 1 1 6 1 127] from result['StatefulPartitionedCall:0'][0][0]. Where the first 4 elements of [ 1 1 1 6 ] are xywh, the 5 element [ 1 ] is the confidence, and the 6th element [127] is the class. Are my assumptions on how to read this correct? I'm finding the class value of 127 hard to believe because I only trained this model using one class.

The vendor's SDK is heavily trimmed down, from the tensorflow library only the tensorflow.core library section is installed. Because of this, tf.reshape, rf.cast, and tf.argmax method are not found. Is there a way to calculate the cls variable by only using numpy?

When I look at my model using Netron, I see the following outputs.

question

Should I dequantize my tensor by applying the given equation to get more understandable results?

Thank you for your time and words of wisdom.

Additional

No response

glenn-jocher commented 1 month ago

@mwickersheim the ... in x[..., :4] is a shorthand for selecting all preceding dimensions. In your case, it means selecting all elements along the first dimension and the first 4 elements along the last dimension.

Regarding your results, the shape (1, 25200, 6) indicates that there are 25200 potential detections, but not all are necessarily valid. The first 4 elements represent xywh, the 5th is the confidence, and the 6th is the class. If you only trained with one class, the class value should be 0. The value 127 suggests an issue with the model or inference process.

To calculate the cls variable using numpy, you can use:

import numpy as np

x = result['StatefulPartitionedCall:0'][0]
xywh = x[..., :4]
conf = x[..., 4:5]
cls = np.argmax(x[..., 5:], axis=1).reshape(-1, 1).astype(np.float32)
output = np.concatenate([conf, cls, xywh], axis=1)

Dequantizing your tensor might help if your model uses quantization. Apply the dequantization equation provided by your model's documentation.