tensorflow / serving

A flexible, high-performance serving system for machine learning models
https://www.tensorflow.org/serving
Apache License 2.0
6.18k stars 2.19k forks source link

My Demo Bert Model Failed to Serve #2267

Open cceasy opened 1 day ago

cceasy commented 1 day ago

I am trying to use tensorflow serving to serve a keras bert model, but I have problem to predict with rest api, below are informations. Can you please help me to resolve this problem.

model definition

import os
os.environ["KERAS_BACKEND"] = "tensorflow"  # "jax" or "tensorflow" or "torch"

import tensorflow_datasets as tfds
import keras_nlp

imdb_train, imdb_test = tfds.load(
    "imdb_reviews",
    split=["train", "test"],
    as_supervised=True,
    batch_size=16,
)

import keras
# Load a model.
classifier = keras_nlp.models.BertClassifier.from_preset(
    "bert_tiny_en_uncased",
    num_classes=2,
    activation="softmax",
)
# Compile the model.
classifier.compile(
    loss="sparse_categorical_crossentropy",
    optimizer=keras.optimizers.Adam(5e-5),
    metrics=["sparse_categorical_accuracy"],
    jit_compile=True,
)
# Fine-tune.
classifier.fit(imdb_train.take(250), validation_data=imdb_test.take(250))
# Predict new examples.
classifier.predict(["What an amazing movie!", "A total waste of my time."])
# expected output: array([[0.34156954, 0.65843046], [0.52648497, 0.473515  ]], dtype=float32)

save the model to local path

import tensorflow as tf
import keras_nlp

def preprocess(inputs):
    # Convert input strings to token IDs, padding mask, and segment IDs
    preprocessor = classifier.preprocessor
    encoded = preprocessor(inputs)
    return {
        'token_ids': encoded['token_ids'],
        'padding_mask': encoded['padding_mask'],
        'segment_ids': encoded['segment_ids']
    }

@tf.function(input_signature=[tf.TensorSpec(shape=[None], dtype=tf.string)])
def serving_fn(inputs):
    preprocessed = preprocess(inputs)
    outputs = classifier(preprocessed)
    return outputs

# Save the model
model_export_path = "/Users/xxx/tf_saved_models/my-keras-bert-model/1"
tf.saved_model.save(
    classifier,
    export_dir=model_export_path,
    signatures={"serving_default": serving_fn}
)

print(f"Model saved to: {model_export_path}")

build the tensorflow serving docker image

FROM tensorflow/serving:latest

COPY my-keras-bert-model /models/my_keras_bert_model

# Set the model name environment variable
ENV MODEL_NAME my_keras_bert_model
# ENV OMP_NUM_THREADS 8
# ENV TF_NUM_INTEROP_THREADS 8
# ENV TF_NUM_INTRAOP_THREADS 8

# Start TensorFlow Serving
CMD tensorflow_model_server --port=8500 --rest_api_port=8501 --model_name=${MODEL_NAME} --model_base_path=/models/${MODEL_NAME}

predict request

POST http://localhost:8501/v1/models/my_keras_bert_model/versions/1:predict Content-Type: application/json

{"instances": ["What an amazing movie!", "A total waste of my time."]}

predict output (ERROR)

{ "error": "Op type not registered 'TFText>RoundRobinTrim' in binary running on ljh-my-keras-bert-model. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib (e.g. tf.contrib.resampler), accessing should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed." }

cceasy commented 1 day ago

my local versions

Python 3.10.14
tensorflow                                   2.18.0
tensorflow-datasets                          4.9.6
tensorflow-io-gcs-filesystem                 0.37.1
tensorflow-metadata                          1.16.1
tensorflow-text                              2.18.0
keras                                        3.6.0
keras-hub-nightly                            0.16.1.dev202410210343
keras-nlp                                    0.17.0