YoloV8 - decode_predictions() not included when using tf.saved_model.save() for TFServing

Current Behavior:

The model predict method does not return the same result when deserializing it with keras.models.load_model (.keras) and tf.saved_model.load (.pb). It appears as if the post processing step (decode_predictions) is not included in the pb-variant.

This can be easily seen by looking at the prediction results. Here the dict keys differ depending on the serialization mechanism:

Loading with Keras:

y_keras.keys():  dict_keys(['boxes', 'confidence', 'classes', 'num_detections'])
y_keras['classes'].shape:  (1, 100)
y_keras['boxes'].shape:  (1, 100, 4)
y_keras['confidence'].shape:  (1, 100)

Loading with TF:

y_served.keys():  dict_keys(['boxes', 'classes'])
y_served['classes'].shape:  (1, 8400, 1)
y_served['boxes'].shape:  (1, 8400, 64)

Expected Behavior:

Ideally it would return the same results (including the post processing) for simplifying the usage with TFServing. Or am I misunderstanding the usage of the functions?

Steps To Reproduce:

You can find a minimal sample here: https://gist.github.com/pefuer/f0625dde60e7f49f45ca86592735a9f7

Version:

Keras CV: 0.9
TF: 2.17.0

Anything else:

https://github.com/keras-team/keras-cv/issues/2370#issuecomment-1979335955 suggests using a tf.function to include the decoding step:

# custom postprocessing function for serving
@tf.function
def inference(input_tensor):
  inputs = {"input_layer_1": input_tensor}
  raw_preds = yolo_keras(inputs)
  preds = yolo_keras.decode_predictions(raw_preds, input_tensor)
  return preds

# inference function for saved model
saved_model_signature = {
  tf.saved_model.DEFAULT_SERVING_SIGNATURE_DEF_KEY: inference.get_concrete_function(
    input_tensor=tf.TensorSpec(
      shape=[
        None,
        640,
        640,
        3,
      ],
      dtype=tf.float32,
      name="input",
    )
)}

# save model
tf.saved_model.save(yolo_keras, "model-sig", signatures=saved_model_signature)

When reloading this model, the shapes are as expected. When serving the model with in a docker container with: docker run -p 8500:8500 -p 8501:8501 --mount type=bind,source=/home/tfserving/data/model-sig,target=/models/model -e MODEL_NAME=model --name tfserving -t tensorflow/serving

I still get an error when calling the model:

2024-10-23 09:36:20.741520: I external/org_tensorflow/tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: FAILED_PRECONDITION: Could not find variable pa_fpn_p3p4p5_downsample2_block_pre_0_2_bn/gamma. This could mean that the variable has been deleted. In TF1, it can also mean the variable is uninitialized. Debug info: container=localhost, status error message=Resource localhost/pa_fpn_p3p4p5_downsample2_block_pre_0_2_bn/gamma/N10tensorflow3VarE does not exist.
         [[{{function_node __inference_inference_36704}}{{node yolov8_detector_1/pa_fpn_p3p4p5_downsample2_block_pre_0_2_bn_1/Cast_2/ReadVariableOp}}]]

keras-team / keras-cv