tensorflow / serving

A flexible, high-performance serving system for machine learning models
https://www.tensorflow.org/serving
Apache License 2.0
6.16k stars 2.19k forks source link

Could not find variable test_8/embeddings. This could mean that the variable has been deleted. In TF1, it can also mean the variable is uninitialized. #2250

Closed la-serene closed 3 weeks ago

la-serene commented 3 weeks ago

System information

Describe the problem

I'm trying to create a simple model consisting of Embedding and LSTM. It performed well on google Colab but when I served with docker, it showed the error.

Source code / logs

import tensorflow as tf
from tensorflow.keras.export import ExportArchive
from tensorflow.keras.layers import LSTM, Embedding
# TF_VERSION: 2.17.0

Class definition:

class Test(tf.keras.Model):
    def __init__(self):
        super().__init__()
        self.e = Embedding(input_dim=10, output_dim=2)
        self.l = LSTM(10, return_state=True, return_sequences=True)

    def call(self, x):
        embedding = self.e(x)
        res = self.l(embedding)
        return res

    @tf.function(input_signature=[tf.TensorSpec(shape=(None,None), dtype=tf.int32)])
    def serving(self, input):
        embeddin = self.e(input)
        p = self.l(embeddin)
        return p

Tested with input

t = Test()
t(tf.constant([[2, 3, 6], [1, 4, 2]]))
t.serving(tf.constant([[2, 3, 6, 8], [1, 4, 2, 5]]))

Saving as SavedModel:

tf.saved_model.save(t, "test", signatures={'serving_default': t.serving})

I noticed that when I reloaded the model as examined its variables:

t1 = tf.saved_model.load("test")
print(t1.signatures["serving_default"].trainable_variables)

All variables are still present.

Here is the link to the notebook. All the related code lies at the Missing Variable section.

Exact Steps to Reproduce

I ran this line in terminal: PS C:\Users\USERNAME> docker run -p 8501:8501 --mount type=bind,source=c:/users/username/downloads/test,target=/models/test -e MODEL_NAME=test -t tensorflow/serving:2.17.0

Here is all the log:

2024-09-02 02:28:41.071295: I external/org_tensorflow/tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0. 2024-09-02 02:28:41.074079: I tensorflow_serving/model_servers/server.cc:77] Building single TensorFlow model file config: model_name: test model_base_path: /models/test 2024-09-02 02:28:41.074504: I tensorflow_serving/model_servers/server_core.cc:474] Adding/updating models. 2024-09-02 02:28:41.074545: I tensorflow_serving/model_servers/server_core.cc:603] (Re-)adding model: test 2024-09-02 02:28:41.322000: I tensorflow_serving/core/basic_manager.cc:740] Successfully reserved resources to load servable {name: test version: 1} 2024-09-02 02:28:41.322072: I tensorflow_serving/core/loader_harness.cc:68] Approving load for servable version {name: test version: 1} 2024-09-02 02:28:41.322125: I tensorflow_serving/core/loader_harness.cc:76] Loading servable version {name: test version: 1} 2024-09-02 02:28:41.327494: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:83] Reading SavedModel from: /models/test/1 2024-09-02 02:28:41.331752: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:52] Reading meta graph with tags { serve } 2024-09-02 02:28:41.331803: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:147] Reading SavedModel debug info (if present) from: /models/test/1 2024-09-02 02:28:41.332764: I external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-09-02 02:28:41.367035: I external/org_tensorflow/tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:388] MLIR V1 optimization pass is not enabled 2024-09-02 02:28:41.367825: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:236] Restoring SavedModel bundle. 2024-09-02 02:28:41.412871: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:220] Running initialization op on SavedModel bundle at path: /models/test/1 2024-09-02 02:28:41.423320: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:462] SavedModel load for tags { serve }; Status: success: OK. Took 95775 microseconds. 2024-09-02 02:28:41.425036: I tensorflow_serving/servables/tensorflow/saved_model_warmup_util.cc:82] No warmup data file found at /models/test/1/assets.extra/tf_serving_warmup_requests 2024-09-02 02:28:41.547890: I tensorflow_serving/core/loader_harness.cc:97] Successfully loaded servable version {name: test version: 1} 2024-09-02 02:28:41.549663: I tensorflow_serving/model_servers/server_core.cc:495] Finished adding/updating models 2024-09-02 02:28:41.549759: I tensorflow_serving/model_servers/server.cc:121] Using InsecureServerCredentials 2024-09-02 02:28:41.549797: I tensorflow_serving/model_servers/server.cc:388] Profiler service is enabled 2024-09-02 02:28:41.552993: I tensorflow_serving/model_servers/server.cc:423] Running gRPC ModelServer at 0.0.0.0:8500 ... [warn] getaddrinfo: address family for nodename not supported 2024-09-02 02:28:41.556669: I tensorflow_serving/model_servers/server.cc:444] Exporting HTTP/REST API at:localhost:8501 ... [evhttp_server.cc : 250] NET_LOG: Entering the event loop ...

When I tried to curl to the server curl -d '{"instances": [[1, 2]]}' -X POST http://localhost:8501/v1/models/test:predict

it showed

2024-09-02 02:25:29.165327: I tensorflow_serving/model_servers/server.cc:444] Exporting HTTP/REST API at: localhost:8501 ... 2024-09-02 02:25:35.124627: I external/org_tensorflow/tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: FAILED_PRECONDITION: Could not find variable test_4/embeddings. This could mean that the variable has been deleted. In TF1, it can also mean the variable is uninitialized. Debug info: container=localhost, status error message=Resource localhost/test_4/embeddings/N10tensorflow3VarE does not exist. [[{{function_node __inference_serving_5353}}{{node embedding_4_1/GatherV2/ReadVariableOp}}]] 2024-09-02 02:25:35.125551: I external/org_tensorflow/tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: ABORTED: Stopping remaining executors.

la-serene commented 3 weeks ago

I encountered a similar problem while working on another project. The issue is fixed when I use ExportArchive class to save model and add model weight to the ExportArchive instance.

export_archive = ExportArchive()
export_archive.track(model)
export_archive.add_endpoint(
    "translate",
    model.translate
)
export_archive.add_variable_collection("my_vars", model.weights)
export_archive.write_out("nmt")

I haven't working on this issue yet, but I suppose the solution is the same. However, I still couldn't figure out why model weights was not loaded by saving as tf.saved_model.save(t, "test", signatures={'serving_default': t.serving}) even though they could be inspected by reloading.