keras-team / keras-cv

Industry-strength Computer Vision workflows with Keras
Other
1.01k stars 330 forks source link

Inference on ONNX YOLOv8 model #2460

Open cflavsAmbev opened 4 months ago

cflavsAmbev commented 4 months ago

I trained a yolo xs model and exported as onnx file.

I created the inference session by following the code below

import onnxruntime as rt

sess = rt.InferenceSession(MODEL_PATH, providers=rt.get_available_providers()) 

model_inputs = sess.get_inputs()
input_names = [model_inputs[i].name for i in range(len(model_inputs))] 
input_shape = model_inputs[0].shape 

model_output = sess.get_outputs() 
output_names = [model_output[i].name for i in range(len(model_output))] 
outputs = sess.run(output_names, {input_names[0]: image.numpy()})
boxes, raw_scores = outputs

When executing the code, the output_names are ['box', 'class']. However, when I check each output, I get box shapes as (1, 8400, 64) and raw_scores shape equal to (1, 8400, 6). Checking the box I have an array of 64 values, including negative values. How can I extract the bouding boxes from this output?

Example: array([-0.41838697,  4.6691685 , -0.31398767,  0.903074  , -0.4759045 ,
       -0.1759668 ,  0.13982718,  0.17300718,  0.09568841,  0.25713545,
       -0.48056224, -1.2766149 , -0.28608924, -0.49068266, -0.6215335 ,
       -1.3962162 ,  3.5720341 ,  2.485091  , -0.05451238,  1.2690719 ,
        0.26219204, -0.4092333 , -0.7778678 ,  0.09583969, -1.0101943 ,
       -1.1997509 , -0.7010503 , -0.33682668, -0.84273565, -1.0975788 ,
       -0.46223986, -1.0694335 ,  0.29062057,  1.8742971 ,  2.205299  ,
        0.57432723, -1.205116  ,  1.618118  , -0.07109317, -0.61723953,
       -0.9500371 , -0.41608053, -0.20256181, -1.1494515 , -0.6518638 ,
       -0.19544908, -0.84548193, -1.1186968 ,  0.49770474,  2.1773698 ,
        0.43691462, -0.5621399 ,  1.1421276 , -0.02915113,  0.565164  ,
       -0.08713075, -0.31974483, -0.77772075, -1.0449475 , -0.3073484 ,
       -0.6940311 , -1.1068969 , -0.9517348 , -0.96367306], dtype=float32)
cflavsAmbev commented 4 months ago

I tried following two issues reported 1337 and 2298 and implemented the following code

from keras_cv import ops
from keras_cv.models.object_detection.yolo_v8.yolo_v8_detector import YOLOV8Detector, decode_regression_to_boxes, dist2bbox, get_anchors
BOX_REGRESSION_CHANNELS = 64
preds = model.outputs[0]
model.outputs[0] = tf.reshape(preds, 
                            [-1, 4, BOX_REGRESSION_CHANNELS // 4])
model.outputs[0] = tf.linalg.matmul(keras.backend.softmax(model.outputs[0], axis=-1),
                keras.backend.arange(BOX_REGRESSION_CHANNELS // 4, dtype="float32")[..., None])
model.outputs[0] = tf.squeeze(model.outputs[0], -1)

anchor_points, stride_tensor = get_anchors(image_shape=model.input_shape[1:3])
stride_tensor = keras.backend.expand_dims(stride_tensor, axis=-1)

model.outputs[0] = dist2bbox(model.outputs[0], anchor_points) * stride_tensor 

model = tf.keras.Model(inputs=model.inputs, outputs=model.outputs)
model.summary()

I saved the model as onnx file but when I perform a sess.run I get this exception.

RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running Sub node. Name:'model_7/tf.math.subtract_5/Sub' Status Message: /onnxruntime_src/onnxruntime/core/providers/cpu/math/element_wise_ops.h:629 onnxruntime::Broadcaster::Broadcaster(gsl::span, gsl::span) largest <= 1 was false. Can broadcast 0 by 0 or 1. 31950 is invalid.

kvlsky commented 4 months ago

@sachinprasadhs do you have any updates on this issue?

christian-plourde commented 2 months ago

I'm having the same issue. I tried the solution here and I run into the same problem:

Error: SessionRun(Msg("Non-zero status code returned while running Sub node. Name:'YOLOv8_1/Sub' Status Message: /home/runn er/work/ort-artifacts-staging/ort-artifacts-staging/onnxruntime/onnxruntime/core/providers/cpu/math/element_wise_ops.h:666 onnxruntime::Broadcaster::Broadcaster(gsl::span, gsl::span) largest <= 1 was false. Can bro adcast 0 by 0 or 1. 26880 is invalid.

I'm using onnxruntime 1.19.0 to do the inference. The model was produced with tf2onnx v1.16.1 with opset set to 18. If I lower the opset to 13, I instead get the error:

Error: SessionRun(Msg("Non-zero status code returned while running Add node. Name:'YOLOv8_1/Add' Status Message: /home/runn er/work/ort-artifacts-staging/ort-artifacts-staging/onnxruntime/onnxruntime/core/providers/cpu/math/element_wise_ops.h:666 onnxruntime::Broadcaster::Broadcaster(gsl::span, gsl::span) largest <= 1 was false. Can bro adcast 0 by 0 or 1. 26880 is invalid.

christian-plourde commented 2 months ago

I think I have isolated the problem.

With the solution proposed here, the YOLOV8Detector model is wrapped and the prediction decoding is done after the fact with the helper functions from keras_cv.

The steps are as follows:

  1. Get the anchor points from the image by passing in the width and height of the image: anchor_points, stride_tensor = get_anchors(image_shape=model.input_shape[1:3]) This results in two tensors with the following shapes: anchor_points: tf.Tensor([], shape=(0, 2), dtype=float32) stride_tensor: tf.Tensor([], shape=(0, 1), dtype=float32) Which is where the problems begin.

  2. Next, decode the predictions with: decoded = decode_regression_to_boxes(regression) where regression has the shape: <KerasTensor shape=(None, None, 64), dtype=float32, sparse=False, name=keras_tensor_295> resulting in decoded with a shape of: <KerasTensor shape=(None, None, 4), dtype=float32, sparse=True, name=keras_tensor_300> Which is the expected shape (4 values for x1, y1, x2, y2)

  3. Next, get the distance to the bounding boxes with: boxes = dist2bbox(decoded, anchor_points) * stride_tensor This results in this shape for the boxes: boxes <KerasTensor shape=(None, 0, 4), dtype=float32, sparse=True, name=keras_tensor_306> Which is incorrect. It should have the shape (None, None, 4). Because it doesn't, it results in the runtime session crashing with the error from my last comment.

It results in this shape because in the dist2bbox method this happens:

def mydist2bbox(distance, anchor_points):
    left_top, right_bottom = ops.split(distance, 2, axis=-1)
    # left_top: <KerasTensor shape=(None, None, 2), dtype=float32, sparse=False, name=keras_tensor_301> (makes sense, 2 values for each prediction)
    #  right_bottom: <KerasTensor shape=(None, None, 2), dtype=float32, sparse=False, name=keras_tensor_302> (makes sense, 2 values for each prediction)
    x1y1 = anchor_points - left_top
   # x1y1: <KerasTensor shape=(None, 0, 2), dtype=float32, sparse=False, name=keras_tensor_303> (this is wrong because it gets subracted from the anchor_points which has an unexpected shape as I said before)

    x2y2 = anchor_points + right_bottom
   # x2y2: <KerasTensor shape=(None, 0, 2), dtype=float32, sparse=False, name=keras_tensor_304> (same problem as x2y2)
    return ops.concatenate((x1y1, x2y2), axis=-1)  # xyxy bbox

This results in the shape KerasTensor shape=(None, 0, 4), dtype=float32, sparse=True, name=keras_tensor_306> for the boxes which is not expected. Is there a way to make the shape of the boxes (None, None, 4) as expected?

christian-plourde commented 2 months ago

I figured out the problem. The input shape is variable (None, None, None, 3) (batch size, width, height, rgb). When this is passed to the get_anchors, it doesn't know how large to make the resulting tensor resulting in a tensor of shape (0,2) as I described. If I instead fix the image size (in my case 1280 to 1024), every call to get anchors will instead have a shape of (26880, 2), which is the same size as the number of bounding boxes, which means that the add and subtract operations in the wrapped model can broadcast properly, eliminating the errors when inferencing. Hopefully this is helpful to someone else.