tensorflow / serving

A flexible, high-performance serving system for machine learning models
https://www.tensorflow.org/serving
Apache License 2.0
6.19k stars 2.19k forks source link

My Demo Bert Model Failed to Serve #2267

Open cceasy opened 3 weeks ago

cceasy commented 3 weeks ago

I am trying to use tensorflow serving to serve a keras bert model, but I have problem to predict with rest api, below are informations. Can you please help me to resolve this problem.

predict output (ERROR)

{ "error": "Op type not registered 'TFText>RoundRobinTrim' in binary running on ljh-my-keras-bert-model. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib (e.g. tf.contrib.resampler), accessing should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed." }

my local versions

Python 3.10.14
tensorflow                                   2.18.0
tensorflow-datasets                          4.9.6
tensorflow-io-gcs-filesystem                 0.37.1
tensorflow-metadata                          1.16.1
tensorflow-text                              2.18.0
keras                                        3.6.0
keras-hub-nightly                            0.16.1.dev202410210343
keras-nlp                                    0.17.0

model definition

import os
os.environ["KERAS_BACKEND"] = "tensorflow"  # "jax" or "tensorflow" or "torch"

import tensorflow_datasets as tfds
import keras_nlp

imdb_train, imdb_test = tfds.load(
    "imdb_reviews",
    split=["train", "test"],
    as_supervised=True,
    batch_size=16,
)

import keras
# Load a model.
classifier = keras_nlp.models.BertClassifier.from_preset(
    "bert_tiny_en_uncased",
    num_classes=2,
    activation="softmax",
)
# Compile the model.
classifier.compile(
    loss="sparse_categorical_crossentropy",
    optimizer=keras.optimizers.Adam(5e-5),
    metrics=["sparse_categorical_accuracy"],
    jit_compile=True,
)
# Fine-tune.
classifier.fit(imdb_train.take(250), validation_data=imdb_test.take(250))
# Predict new examples.
classifier.predict(["What an amazing movie!", "A total waste of my time."])
# expected output: array([[0.34156954, 0.65843046], [0.52648497, 0.473515  ]], dtype=float32)

save the model to local path

import tensorflow as tf
import keras_nlp

def preprocess(inputs):
    # Convert input strings to token IDs, padding mask, and segment IDs
    preprocessor = classifier.preprocessor
    encoded = preprocessor(inputs)
    return {
        'token_ids': encoded['token_ids'],
        'padding_mask': encoded['padding_mask'],
        'segment_ids': encoded['segment_ids']
    }

@tf.function(input_signature=[tf.TensorSpec(shape=[None], dtype=tf.string)])
def serving_fn(inputs):
    preprocessed = preprocess(inputs)
    outputs = classifier(preprocessed)
    return outputs

# Save the model
model_export_path = "/Users/xxx/tf_saved_models/my-keras-bert-model/1"
tf.saved_model.save(
    classifier,
    export_dir=model_export_path,
    signatures={"serving_default": serving_fn}
)

print(f"Model saved to: {model_export_path}")

build the tensorflow serving docker image

FROM tensorflow/serving:latest

COPY my-keras-bert-model /models/model
RUN ls /models/model

# Set the model environment variables
# ENV OMP_NUM_THREADS 4
# ENV TF_NUM_INTEROP_THREADS 4
# ENV TF_NUM_INTRAOP_THREADS 4

# Start TensorFlow Serving
ENTRYPOINT ["tensorflow_model_server"]
CMD ["--port=8500", "--rest_api_port=8080", "--model_name=model", "--model_base_path=/models/model"]

predict request

POST http://localhost:8080/v1/models/model/versions/1:predict Content-Type: application/json

{"instances": ["What an amazing movie!", "A total waste of my time."]}

janasangeetha commented 1 week ago

Hi @cceasy, Thank you for reporting. I was able to reproduce the issue. I will check on this internally and update here. Below is the error:

2024-11-12 09:34:40.365027: I external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
E0000 00:00:1731404080.457354     107 mlir_bridge_pass_util.cc:68] Failed to parse __inference_serving_fn_19270: Op type not registered 'TFText>RoundRobinTrim' in binary running on 58d2778e1319. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib (e.g. `tf.contrib.resampler`), accessing should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.
I0000 00:00:1731404080.461934     107 mlir_graph_optimization_pass.cc:401] MLIR V1 optimization pass is not enabled
.
.
2024-11-12 09:34:40.505939: E external/org_tensorflow/tensorflow/core/grappler/optimizers/tfg_optimizer_hook.cc:135] tfg_optimizer{any(tfg-consolidate-attrs,tfg-toposort,tfg-shape-inference{graph-version=0},tfg-prepare-attrs-export)} failed: INVALID_ARGUMENT: Unable to find OpDef for TFText>RoundRobinTrim
    While importing function: __inference_serving_fn_19270
    when importing GraphDef to MLIR module in GrapplerHook

Thank you!