tensorflow / serving

A flexible, high-performance serving system for machine learning models
https://www.tensorflow.org/serving
Apache License 2.0
6.18k stars 2.19k forks source link

Error with exporting TF2.2.0 model with tf.lookup.StaticHashTable & LSTM layer for Serving #1719

Closed spate141 closed 1 year ago

spate141 commented 4 years ago

System information

Related Issue with TF-Serving documentation:

As mentioned here on https://github.com/tensorflow/serving/issues/1606; we need a better documentation about exporting TF2.X models to TF-Serving that involves StaticHashTables. Simply disabling eager execution works on most cases but if you're using the new LSTM layer from TF2.2.0; it won't give you the power of CUDA as mentioned here as the key requirement of LSTM cuDNN implementation (7. Eager execution is enabled in the outermost context.)

Related post on tensorflow/tensorflow:

https://github.com/tensorflow/tensorflow/issues/42325

My Issue:

I'm using StaticHashTable as in one Lambda layer after the output layer of my tf.keras model. It's quite simple actually: I've a text classification models and I'm adding a simple lambda layer that takes the model.output and convert the model_id to more general labels. I can save this version of model with model.save(... as H5 format..) without any issue, and can load it back and use it without any problem.

Issue is, when I try to export my TF2.2.0 model for TF-Serving, I can't find how I can export it. Here is what I can do with TF1.X or with TF2.X + tf.compat.v1.disable_eager_execution()

tf.compat.v1.disable_eager_execution()
version = 1
name = 'tmp_model'
export_path = f'/opt/tf_serving/{name}/{version}'
builder = saved_model_builder.SavedModelBuilder(export_path)

model_signature = tf.compat.v1.saved_model.predict_signature_def(
    inputs={
        'input': model.input
    }, 
    outputs={
        'output': model.output
    }
)

with tf.compat.v1.keras.backend.get_session() as sess:
    builder.add_meta_graph_and_variables(
        sess=sess,
        tags=[tf.compat.v1.saved_model.tag_constants.SERVING],
        signature_def_map={
            'predict': model_signature
        },
        # For initializing Hashtables
        main_op=tf.compat.v1.tables_initializer()
    )
    builder.save()

This will save my models with TF1.X format for serving and I can use it without any issue. Things is, I'm using LSTM layer and I want to use my model on GPU. By the documentation, if I disable the eager mode, I can't use the GPU-version of LSTM with TF2.2. And without going through above mentioned code, I can't save my model for serving wrt TF2.2 standard and StaticHashTables.

Here is how I'm trying to export my TF2.2 model which is using StaticHashTables in final layer; and which is giving error as below:

class MyModule(tf.Module):

    def __init__(self, model):
        super(MyModule, self).__init__()
        self.model = model

    @tf.function(input_signature=[tf.TensorSpec(shape=(None, 16), dtype=tf.int32, name='input')])
    def predict(self, input):
        result = self.model(input)
        return {"output": result}

version = 1
name = 'tmp_model'
export_path = f'/opt/tf_serving/{name}/{version}'

module = MyModule(model)
tf.saved_model.save(module, export_path, signatures={"predict": module.predict.get_concrete_function()})

Error:

AssertionError: Tried to export a function which references untracked object Tensor("2907:0", shape=(), dtype=resource).
TensorFlow objects (e.g. tf.Variable) captured by functions must be tracked by assigning them to an attribute of a tracked object or assigned to an attribute of the main object directly.

Any suggestion or am I missing anything on exporting TF2.2 model which is using the StaticHashTables in final Lambda layer for TensorFlow Serving?

Thanks!

rmothukuru commented 4 years ago

@spate141, Can you please provide the complete reproducible code so that we can check it at our end? Thanks!

spate141 commented 4 years ago

@rmothukuru Please find the code from this colab notebook: https://colab.research.google.com/drive/1ch89Veylgg-0FzqGeC4QKDui-a01FKzp?usp=sharing

I've added following sections:

  1. Train a sample model
  2. Load the sample model & add a sample Lambda layer with StaticHashTable that simply demonstrate {0: 1000, 1: 2000} label conversion.
  3. Try to export above model with TensorFlow 2.2.0 for serving (ERROR HERE)
spate141 commented 4 years ago

Seems like I solved it! I would appreciate if you can add a version of this to the documentation so other can use it. To make the variables and the other elements from outside trackable, we need to write the Lambda layer using the subclassed Layer: tf.keras.layers.Layer

class LabelConverter(tf.keras.layers.Layer):

    def __init__(self, **kwargs):
        super(LabelConverter, self).__init__(**kwargs)

        # Implement your StaticHashTable here
        keys = tf.constant([0, 1], dtype=tf.int32)
        values = tf.constant([1000, 2000], dtype=tf.float32)
        table_init = tf.lookup.KeyValueTensorInitializer(keys, values)
        self.table = tf.lookup.StaticHashTable(table_init, -1)

    def build(self, input_shape):
        self.built = True

    def call(self, tensor_input):
        # this block is doing the transformation on input
        label_tensor = tf.cast(tensor_input[:, 0], tf.int32)
        score_tensor = tensor_input[:, 1]
        categories_tensor = self.table.lookup(label_tensor)
        return tf.stack((categories_tensor, score_tensor), axis=1)
# adding on top of already trained keras model
extra_layer = LabelConverter()(model.output)
hash_table_model = tf.keras.models.Model(inputs=model.input, outputs=extra_layer)

version = 1
name = 'tmp_test_serving'
export_path = f'/data/{name}/{version}'
tf.saved_model.save(hash_table_model, export_path)

EDIT: Doesn't work on TF-Serving for some reason!

stefan-falk commented 3 years ago

I am having a similar issue (see stackoverflow) but I am not sure how to solve this in my case.

How can I localize the Tensor "77040:0"? I don't have any idea where it is coming from. It looks like as if it's coming from the Conv2D layer(s) (see stackoverflow link) but those are tracked. It makes no sense.

spate141 commented 2 years ago

This solution is not working with the TF 2.5.0 and TF-Serving! I can export the model and can even load it with the serving but during prediction, signature_def is causing the errors!

naveen-marthala commented 2 years ago

I have tried to export my text-classification model, built and trained using tf.keras as shown below, and I get the same error.

I have used tensorflow 2.7.0 on Ubuntu 18.04(google colab) to train and save the model.

my code:

class TFModel(tf.Module):
    def __init__(self, model: tf.keras.Model) -> None:
        self.model = model

        # this "key_value_tensor_initializer" is an instance of 'tf.lookup.KeyValueTensorInitializer', but this key-value initialiser is not build inside this constructor, but an outside global variable
        self.hash_table = tf.lookup.StaticHashTable(initializer=key_value_tensor_initializer,
                                                    default_value=1, name='hash_table')

    def pre_process(self, comment):
        # this function does all the pre-processing on the input payload to the format acceptable by model for predictions
        # convert the text to lower using "tf.strings.lower" -> does 3 regex ops with "tf.strings.regex_replace" one after the other -> 
        # splits sentences to words using "tf.strings.split" -> then look-up those in hash-table -> then pad 0s at the end -> return it
        return preprocessed_comment

    def post_process(self, probabilities):
        # this function converts the model predictions(probabilities) to output classes
        # this function will return a RaggedTensor
        return final_labels

    def f(self):
        @tf.function
        def inner(comment: List[str]) -> Dict[str, Union[List[float], List[ByteString], str]]:
            processed_comment = self.pre_process(comment)
            predicted_probs = self.model(processed_comment)
            predicted_labels = self.post_process(predicted_probs)
            return {'predicted_probabilities': predicted_probs,
                    'predicted_labels':predicted_labels,
                    'description': 'prediction probabilities ranges from 0 (hesitant) to 1 (confident).'}
        return inner

    ########## also tried, but to no avail ##########
    # @tf.function(input_signature=[tf.TensorSpec(shape=[1,], dtype=tf.string)])
    # @tf.function
    # def serving_default(self, comment: List[str]) -> Dict[str, Union[List[float], List[ByteString], str]]:
    #     processed_comment = self.pre_process(comment)
    #     predicted_probs = self.model(processed_comment)
    #     predicted_labels = self.post_process(predicted_probs)
    #     return {'predicted_probabilities': predicted_probs,
    #             'predicted_labels':predicted_labels,
    #             'description': 'prediction probabilities ranges from 0 (hesitant) to 1 (confident).'}
    ########################################

##### serialising part

## create an instance
tf_model_wrapper = TFModel(model)
# trying to create concrete_function as mentioned on github issue
concrete_fn = tf_model_wrapper.f().get_concrete_function(comment=tf.TensorSpec([None], tf.string))
## save the model to disk(serialize it)
tf.keras.models.save_model(
    model=tf_model_wrapper.model,
    filepath='/content/complex_nw_v1',
    signatures={'serving_default': concrete_fn})

but, I get this error:

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-76-889bb0fb80ba> in <module>()
    101     model=tf_model_wrapper.model,
    102     filepath='/content/complex_nw_v1',
--> 103     signatures={"serving_default": concrete_fn})

1 frames
/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py in error_handler(*args, **kwargs)
     65     except Exception as e:  # pylint: disable=broad-except
     66       filtered_tb = _process_traceback_frames(e.__traceback__)
---> 67       raise e.with_traceback(filtered_tb) from None
     68     finally:
     69       del filtered_tb

/usr/local/lib/python3.7/dist-packages/tensorflow/python/saved_model/save.py in _map_captures_to_created_tensors(original_captures, resource_map)
    530           "directly.\n\n Trackable Python objects referring to this tensor "
    531           "(from gc.get_referrers, limited to two hops):\n{}".format("\n".join(
--> 532               [repr(obj) for obj in trackable_referrers])))
    533     export_captures.append(mapped_resource)
    534   return export_captures

AssertionError: Tried to export a function which references 'untracked' resource Tensor("308003:0", shape=(), dtype=resource). TensorFlow objects (e.g. tf.Variable) captured by functions must be 'tracked' by assigning them to an attribute of a tracked object or assigned to an attribute of the main object directly.

 Trackable Python objects referring to this tensor (from gc.get_referrers, limited to two hops):
<tensorflow.python.ops.lookup_ops.StaticHashTable object at 0x7f71763e1550>

How do I fix this and save the model with properly?

fsonntag commented 2 years ago

Also facing this issue, it seems almost impossible to fix :/

Anybody made some progress since then?

Prayforhanluo commented 2 years ago

Also facing this issue, how to fix?

ConstantinVasilev commented 2 years ago

Facing the same issue which seems to have been open for 2 years now. Anything new?

stefan-falk commented 2 years ago

To anyone facing this issue, make sure you're not defining trainable layers as class attributes in your sub-classes. It produced a similar error in my case.

ConstantinVasilev commented 2 years ago

Hi @stefan-falk , I think the part of the error pointing to the static hash table is quite explicit:

Function name = b'__inference_signature_wrapper_197767'
Captured Tensor = <ResourceHandle(name="hash_table_749c3365-6eaa-4235-bc88-f21ad01980ff"...
type="tensorflow::lookup::LookupInterface"
EdwardCuiPeacock commented 2 years ago

I have tried to export my text-classification model, built and trained using tf.keras as shown below, and I get the same error.

I have used tensorflow 2.7.0 on Ubuntu 18.04(google colab) to train and save the model.

my code:

class TFModel(tf.Module):
    def __init__(self, model: tf.keras.Model) -> None:
        self.model = model

        # this "key_value_tensor_initializer" is an instance of 'tf.lookup.KeyValueTensorInitializer', but this key-value initialiser is not build inside this constructor, but an outside global variable
        self.hash_table = tf.lookup.StaticHashTable(initializer=key_value_tensor_initializer,
                                                    default_value=1, name='hash_table')

    def pre_process(self, comment):
        # this function does all the pre-processing on the input payload to the format acceptable by model for predictions
        # convert the text to lower using "tf.strings.lower" -> does 3 regex ops with "tf.strings.regex_replace" one after the other -> 
        # splits sentences to words using "tf.strings.split" -> then look-up those in hash-table -> then pad 0s at the end -> return it
        return preprocessed_comment

    def post_process(self, probabilities):
        # this function converts the model predictions(probabilities) to output classes
        # this function will return a RaggedTensor
        return final_labels

    def f(self):
        @tf.function
        def inner(comment: List[str]) -> Dict[str, Union[List[float], List[ByteString], str]]:
            processed_comment = self.pre_process(comment)
            predicted_probs = self.model(processed_comment)
            predicted_labels = self.post_process(predicted_probs)
            return {'predicted_probabilities': predicted_probs,
                    'predicted_labels':predicted_labels,
                    'description': 'prediction probabilities ranges from 0 (hesitant) to 1 (confident).'}
        return inner

    ########## also tried, but to no avail ##########
    # @tf.function(input_signature=[tf.TensorSpec(shape=[1,], dtype=tf.string)])
    # @tf.function
    # def serving_default(self, comment: List[str]) -> Dict[str, Union[List[float], List[ByteString], str]]:
    #     processed_comment = self.pre_process(comment)
    #     predicted_probs = self.model(processed_comment)
    #     predicted_labels = self.post_process(predicted_probs)
    #     return {'predicted_probabilities': predicted_probs,
    #             'predicted_labels':predicted_labels,
    #             'description': 'prediction probabilities ranges from 0 (hesitant) to 1 (confident).'}
    ########################################

##### serialising part

## create an instance
tf_model_wrapper = TFModel(model)
# trying to create concrete_function as mentioned on github issue
concrete_fn = tf_model_wrapper.f().get_concrete_function(comment=tf.TensorSpec([None], tf.string))
## save the model to disk(serialize it)
tf.keras.models.save_model(
    model=tf_model_wrapper.model,
    filepath='/content/complex_nw_v1',
    signatures={'serving_default': concrete_fn})

but, I get this error:

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-76-889bb0fb80ba> in <module>()
    101     model=tf_model_wrapper.model,
    102     filepath='/content/complex_nw_v1',
--> 103     signatures={"serving_default": concrete_fn})

1 frames
/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py in error_handler(*args, **kwargs)
     65     except Exception as e:  # pylint: disable=broad-except
     66       filtered_tb = _process_traceback_frames(e.__traceback__)
---> 67       raise e.with_traceback(filtered_tb) from None
     68     finally:
     69       del filtered_tb

/usr/local/lib/python3.7/dist-packages/tensorflow/python/saved_model/save.py in _map_captures_to_created_tensors(original_captures, resource_map)
    530           "directly.\n\n Trackable Python objects referring to this tensor "
    531           "(from gc.get_referrers, limited to two hops):\n{}".format("\n".join(
--> 532               [repr(obj) for obj in trackable_referrers])))
    533     export_captures.append(mapped_resource)
    534   return export_captures

AssertionError: Tried to export a function which references 'untracked' resource Tensor("308003:0", shape=(), dtype=resource). TensorFlow objects (e.g. tf.Variable) captured by functions must be 'tracked' by assigning them to an attribute of a tracked object or assigned to an attribute of the main object directly.

 Trackable Python objects referring to this tensor (from gc.get_referrers, limited to two hops):
<tensorflow.python.ops.lookup_ops.StaticHashTable object at 0x7f71763e1550>

How do I fix this and save the model with properly?

@naveen-marthala

I may be able to offer a solution, as I have recently encountered this error in a similar way. It turns out we need to save the HashTable as one of model's properties. In this specific example,


## create an instance
tf_model_wrapper = TFModel(model)
# trying to create concrete_function as mentioned on github issue
concrete_fn = tf_model_wrapper.f().get_concrete_function(comment=tf.TensorSpec([None], tf.string))
## save the model to disk(serialize it)
model_to_save = tf_model_wrapper.model
model_to_save.hash_table = tf_model_wrapper.hash_table
tf.keras.models.save_model(
    model=model_to_save,
    filepath='/content/complex_nw_v1',
    signatures={'serving_default': concrete_fn})

The actual name of the attribute probably doesn't matter.

singhniraj08 commented 1 year ago

@spate141/ All,

AssertionError: Tried to export a function which references 'untracked' resource Tensor("308003:0", shape=(), dtype=resource). can be solved by not defining trainable layers as class attributes in your sub-classes as Keras doesn't track class, but only tracks layer instance as per commit https://github.com/tensorflow/tensorflow/commit/9d724a8e6034d321e97cdc9972d4d6e7adb3e3ca. You can refer here for clear explanation.

Also, you can try saving static HashTable as one of model's properties as shown in above comment.

Thank you!

spate141 commented 1 year ago

Thanks @singhniraj08 for the reply! Closing this issue for now since I'm not working/dealing with this error but hopefully someone in future will encounter something similar and will find the relevant help they need from this.

adriangay commented 9 months ago

this comment worked for me. Even though tf.lookup.StaticHashTable inherits from TrackableResource, tracing does not seems to be able to track it without it being explicitly saved.