keras-team / keras

Deep Learning for humans
http://keras.io/
Apache License 2.0
62.14k stars 19.49k forks source link

Keras3 saved model failed to load variable #19274

Open haohuanw opened 8 months ago

haohuanw commented 8 months ago

i am trying to serve a keras3 model with tf saved model but running into issue when trying to load variables from using saved_model_cli or tensorflow c++ apis.

using examples from https://keras.io/examples/keras_recipes/tf_serving/ resulted in below error when trying to load model:

saved_model_cli run --dir model/1 --tag_set serve --signature_def serving_default --inputs inputs=banana.npy
2024-03-08 22:42:01.428869: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-03-08 22:42:02.279676: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-03-08 22:42:03.033488: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1928] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 14348 MB memory:  -> device: 0, name: NVIDIA RTX A4000, pci bus id: 0000:02:00.0, compute capability: 8.6
WARNING:tensorflow:From /home/hwang/miniconda3/envs/keras-trial/lib/python3.10/site-packages/tensorflow/python/tools/saved_model_cli.py:716: load (from tensorflow.python.saved_model.loader_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.saved_model.load` instead.
W0308 22:42:03.034847 140212309307776 deprecation.py:50] From /home/hwang/miniconda3/envs/keras-trial/lib/python3.10/site-packages/tensorflow/python/tools/saved_model_cli.py:716: load (from tensorflow.python.saved_model.loader_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.saved_model.load` instead.
INFO:tensorflow:Restoring parameters from model/1/variables/variables
I0308 22:42:03.156150 140212309307776 saver.py:1417] Restoring parameters from model/1/variables/variables
2024-03-08 22:42:03.218811: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:388] MLIR V1 optimization pass is not enabled
2024-03-08 22:42:04.049429: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: FAILED_PRECONDITION: Could not find variable conv_preds/kernel. This could mean that the variable has been deleted. In TF1, it can also mean the variable is uninitialized. Debug info: container=localhost, status error message=Resource localhost/conv_preds/kernel/N10tensorflow3VarE does not exist.
     [[{{function_node __inference_serving_default_2138}}{{node mobilenet_1.00_224_1/conv_preds_1/convolution/ReadVariableOp}}]]
2024-03-08 22:42:04.049508: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: FAILED_PRECONDITION: Could not find variable conv_preds/kernel. This could mean that the variable has been deleted. In TF1, it can also mean the variable is uninitialized. Debug info: container=localhost, status error message=Resource localhost/conv_preds/kernel/N10tensorflow3VarE does not exist.
     [[{{function_node __inference_serving_default_2138}}{{node mobilenet_1.00_224_1/conv_preds_1/convolution/ReadVariableOp}}]]
     [[StatefulPartitionedCall/_291]]
2024-03-08 22:42:04.049570: I tensorflow/core/framework/local_rendezvous.cc:422] Local rendezvous recv item cancelled. Key hash: 12450294394106819624
2024-03-08 22:42:04.049627: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: ABORTED: Stopping remaining executors.
Traceback (most recent call last):
  File "/home/hwang/miniconda3/envs/keras-trial/lib/python3.10/site-packages/tensorflow/python/client/session.py", line 1402, in _do_call
    return fn(*args)
  File "/home/hwang/miniconda3/envs/keras-trial/lib/python3.10/site-packages/tensorflow/python/client/session.py", line 1385, in _run_fn
    return self._call_tf_sessionrun(options, feed_dict, fetch_list,
  File "/home/hwang/miniconda3/envs/keras-trial/lib/python3.10/site-packages/tensorflow/python/client/session.py", line 1478, in _call_tf_sessionrun
    return tf_session.TF_SessionRun_wrapper(self._session, options, feed_dict,
tensorflow.python.framework.errors_impl.FailedPreconditionError: 2 root error(s) found.
  (0) FAILED_PRECONDITION: Could not find variable conv_preds/kernel. This could mean that the variable has been deleted. In TF1, it can also mean the variable is uninitialized. Debug info: container=localhost, status error message=Resource localhost/conv_preds/kernel/N10tensorflow3VarE does not exist.
     [[{{function_node __inference_serving_default_2138}}{{node mobilenet_1.00_224_1/conv_preds_1/convolution/ReadVariableOp}}]]
     [[StatefulPartitionedCall/_291]]
  (1) FAILED_PRECONDITION: Could not find variable conv_preds/kernel. This could mean that the variable has been deleted. In TF1, it can also mean the variable is uninitialized. Debug info: container=localhost, status error message=Resource localhost/conv_preds/kernel/N10tensorflow3VarE does not exist.
     [[{{function_node __inference_serving_default_2138}}{{node mobilenet_1.00_224_1/conv_preds_1/convolution/ReadVariableOp}}]]
0 successful operations.
0 derived errors ignored.

however python api with tf.saved_model.load works just fine.

another quick reproducible example:

import keras
import numpy as np
import tensorflow as tf

class Plus1Layer(keras.layers.Layer):

    def build(self, input_shape):
        self.constant = self.add_weight(
            shape=(input_shape[-1], ),
            initializer="ones",
            trainable=False,
            name="constant_one"
        )

    def call(self, input):
        return input + self.constant

def get_model(batch_size: int):
    inputs = {"digits": keras.Input(shape=(20, ), name="digits", batch_size=batch_size)}
    outputs = Plus1Layer()(inputs['digits'])
    return inputs, {"logits": outputs}

def export():
    keras_inputs, keras_outputs = get_model(batch_size=1)
    model = keras.Model(keras_inputs, keras_outputs)
    model.build({"digits": (1, 20)})
    outputs = np.ones((1, 20)) * 5
    np.save("digits.npy", outputs)

also resulted in same error when using saved_model_cli:

saved_model_cli run --dir export/ --tag_set serve --signature_def serving_default --inputs inputs=digits.npy[digits]
2024-03-08 22:06:31.514089: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-03-08 22:06:32.365837: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
WARNING:tensorflow:Input file digits.npy contains a single ndarray. Name key "digits" ignored.
W0308 22:06:32.993591 139745289941376 saved_model_cli.py:949] Input file digits.npy contains a single ndarray. Name key "digits" ignored.
2024-03-08 22:06:33.099608: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1928] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 14348 MB memory:  -> device: 0, name: NVIDIA RTX A4000, pci bus id: 0000:02:00.0, compute capability: 8.6
WARNING:tensorflow:From /home/hwang/miniconda3/envs/keras-trial/lib/python3.10/site-packages/tensorflow/python/tools/saved_model_cli.py:716: load (from tensorflow.python.saved_model.loader_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.saved_model.load` instead.
W0308 22:06:33.101072 139745289941376 deprecation.py:50] From /home/hwang/miniconda3/envs/keras-trial/lib/python3.10/site-packages/tensorflow/python/tools/saved_model_cli.py:716: load (from tensorflow.python.saved_model.loader_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.saved_model.load` instead.
INFO:tensorflow:Restoring parameters from export/variables/variables
I0308 22:06:33.114219 139745289941376 saver.py:1417] Restoring parameters from export/variables/variables
2024-03-08 22:06:33.117236: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:388] MLIR V1 optimization pass is not enabled
2024-03-08 22:06:33.396540: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: FAILED_PRECONDITION: Could not find variable plus1_layer/constant_one. This could mean that the variable has been deleted. In TF1, it can also mean the variable is uninitialized. Debug info: container=localhost, status error message=Resource localhost/plus1_layer/constant_one/N10tensorflow3VarE does not exist.
     [[{{function_node __inference_serving_default_17}}{{node functional_1_1/plus1_layer_1/add/ReadVariableOp}}]]
2024-03-08 22:06:33.396620: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: FAILED_PRECONDITION: Could not find variable plus1_layer/constant_one. This could mean that the variable has been deleted. In TF1, it can also mean the variable is uninitialized. Debug info: container=localhost, status error message=Resource localhost/plus1_layer/constant_one/N10tensorflow3VarE does not exist.
     [[{{function_node __inference_serving_default_17}}{{node functional_1_1/plus1_layer_1/add/ReadVariableOp}}]]
     [[StatefulPartitionedCall/_19]]
2024-03-08 22:06:33.396669: I tensorflow/core/framework/local_rendezvous.cc:422] Local rendezvous recv item cancelled. Key hash: 11999494181103785133
2024-03-08 22:06:33.396690: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: ABORTED: Stopping remaining executors.
Traceback (most recent call last):
  File "/home/hwang/miniconda3/envs/keras-trial/lib/python3.10/site-packages/tensorflow/python/client/session.py", line 1402, in _do_call
    return fn(*args)
  File "/home/hwang/miniconda3/envs/keras-trial/lib/python3.10/site-packages/tensorflow/python/client/session.py", line 1385, in _run_fn
    return self._call_tf_sessionrun(options, feed_dict, fetch_list,
  File "/home/hwang/miniconda3/envs/keras-trial/lib/python3.10/site-packages/tensorflow/python/client/session.py", line 1478, in _call_tf_sessionrun
    return tf_session.TF_SessionRun_wrapper(self._session, options, feed_dict,
tensorflow.python.framework.errors_impl.FailedPreconditionError: 2 root error(s) found.
  (0) FAILED_PRECONDITION: Could not find variable plus1_layer/constant_one. This could mean that the variable has been deleted. In TF1, it can also mean the variable is uninitialized. Debug info: container=localhost, status error message=Resource localhost/plus1_layer/constant_one/N10tensorflow3VarE does not exist.
     [[{{function_node __inference_serving_default_17}}{{node functional_1_1/plus1_layer_1/add/ReadVariableOp}}]]
     [[StatefulPartitionedCall/_19]]
  (1) FAILED_PRECONDITION: Could not find variable plus1_layer/constant_one. This could mean that the variable has been deleted. In TF1, it can also mean the variable is uninitialized. Debug info: container=localhost, status error message=Resource localhost/plus1_layer/constant_one/N10tensorflow3VarE does not exist.
     [[{{function_node __inference_serving_default_17}}{{node functional_1_1/plus1_layer_1/add/ReadVariableOp}}]]

keras3 + tf version

/home/hwang/miniconda3/envs/keras-trial/bin/pip freeze | grep keras
keras==3.0.5
/home/hwang/miniconda3/envs/keras-trial/bin/pip freeze | grep tensorflow
tensorflow==2.16.1
tensorflow-io-gcs-filesystem==0.36.0
haohuanw commented 8 months ago

looks like using model.export is actually working for me when working with c++ api. is it possible to update the tutorial on the recommended export path?

tiennguyenbn-itr commented 5 months ago

I am facing with the same problem.