[Question] After saving & loading the TFmodel/ Scann/ BruteForce objects with Dict input for User Tower, the loaded model won't work properly #413

xiaoyaoyang closed 2 years ago

xiaoyaoyang commented 2 years ago

About Save & Load BruteForce/Scann and Model object

I am playing this tutorial with a online shopping dataset, and followed the tutorial where the User tower is similar to this:

Class UserModel ():
def call(self, inputs):
    # Take the input dictionary, pass it through each input layer,
    # and concatenate the result.
    return tf.concat([
        tf.reshape(self.normalized_timestamp(inputs["timestamp"]), (-1, 1)),
    ], axis=1)

Where inputs is a dict.

I am able to get the embedding by UserModel()(input_dict) just fine. The issue is when I work with the example in this link: https://www.tensorflow.org/recommenders/examples/efficient_serving, where we want to save the Scann/BF object.

I am able to get the Scann working and able to call it

scann = tfrs.layers.factorized_top_k.ScaNN(model.user_model, num_reordering_candidates=100)
    sku_map.batch(2048).map(lambda x: (x["SKU_KEY"], model.sku_model(x)) ))
    # (sku_map.batch(2048).map(lambda x: x["SKU_KEY"]) , sku_map.batch(2048).map(model.sku_model) )
scann({'CONTEXT_ID': np.array([[b'263', b'34', b'555', b'44', b'3300']]) ,
 'USER_ID': np.array([b'sssksksksksksk'])

and it will return meaningful results. However, if I follow Deploying the approximate model section to save it and load it back, I got an error

ValueError: Could not find matching function to call loaded from the SavedModel. Got:
  Positional arguments (3 total):
    * {'CONTEXT_ID': <tf.Tensor 'queries:0' shape=(1, 5) dtype=string>, 'USER_ID': <tf.Tensor 'queries_1:0' shape=(1,) dtype=string>}
    * None
    * False
  Keyword arguments: {}

Expected these arguments to match one of the following 4 option(s):

Option 1:
  Positional arguments (3 total):
    * {'PRICE': TensorSpec(shape=(None,), dtype=tf.float32, name='PRICE'), 'USER_ID': TensorSpec(shape=(None,), dtype=tf.string, name='USER_ID'), 'TRANS_COUNT': TensorSpec(shape=(None,), dtype=tf.int64, name='TRANS_COUNT'), 'SKU_KEY': TensorSpec(shape=(None, 1), dtype=tf.string, name='SKU_KEY'), 'SKU_DESC': TensorSpec(shape=(None,), dtype=tf.string, name='SKU_DESC'), 'CONTEXT_ID': TensorSpec(shape=(None, 5), dtype=tf.string, name='CONTEXT_ID')}
    * None
    * False
  Keyword arguments: {}

Seems to me 1, the shape is all (None,) 2, it can not identify the input dict anymore.. same thing happened if I tried to save the model and load it back

my_tf_saved_model = tf.keras.models.load_model(

It would throw similar errors but the model(row) (row is a dict) works fine..

Can't do model.evaluate after replacing factorized_metrics with BruteForce

Another strange finding is, with the above setup, if I define the Query model in the BruteForce (brute_force = tfrs.layers.factorized_top_k.BruteForce(model.user_model) ), and then reset factorized_metrics and then do the model.evaluate (for fast performance), it will give me an error

    TypeError: Only integers, slices (`:`), ellipsis (`...`), tf.newaxis (`None`) and scalar tf.int32/tf.int64 tensors are valid indices, got 'USER_ID'

seems it does not like the way how I let User Tower's input as a dict and return self.user_model(inputs['USER_ID']). However, It will work if I do not specify the User_Model when initializing the BruteForce function.

Any insights would be appreciated!

almirb commented 2 years ago

You need to check the dtype of the input data fields. Example: From

input_data = {
 'user_id': np.array([8]),
 'user_name': np.array(["Someone"]), 


input_data = {
 'user_id': np.array([8], dtype=np.int32),
 'user_name': np.array(["Someone"]), 

It depends on what your model is expecting to get.

xiaoyaoyang commented 2 years ago

@almirb Thanks for the reply! II tried with one record (I don't know how to pick one record... thus this code, please let me know if there is a better way :))) )

for row in train_map.batch(1).take(1):

Just want to clarify, in my case, brute_force(row) works, but loaded(row) will throw errors... between brute_force and loaded, I first make sure I call brute_force once, and simply copy-paste code from this tutorial

It also works if I DO NOT Specify the Query model when saving Brute Force. so it would be


feels like it is fine to just save transformation starting from embedding.. (my_query_model(row) will return embedding I think), but if I store the query model which creates the embedding into the brute_force, it will give me errors..

patrickorlando commented 2 years ago

@xiaoyaoyang When serialising a model, tensorflow creates a strict function call signature based on the tracing the model. Before serialising you have passed in a dict containing.

{'PRICE': TensorSpec(shape=(None,), dtype=tf.float32, name='PRICE'), 'USER_ID': TensorSpec(shape=(None,), dtype=tf.string, name='USER_ID'), 'TRANS_COUNT': TensorSpec(shape=(None,), dtype=tf.int64, name='TRANS_COUNT'), 'SKU_KEY': TensorSpec(shape=(None, 1), dtype=tf.string, name='SKU_KEY'), 'SKU_DESC': TensorSpec(shape=(None,), dtype=tf.string, name='SKU_DESC'), 'CONTEXT_ID': TensorSpec(shape=(None, 5), dtype=tf.string, name='CONTEXT_ID')}

When you serialise your model tensorflow will create a call signature that expects all those inputs, even if your model doesn't use them. So when you call it with just USER_ID, it will fail.

It's best to ensure you only pass the required features into your model during training and evaluation. Alternatively you should be able to resolve this by calling the model once with an example record with only the required features before serialising. This will then result in another call signature that matches the input you expect to pass when serving.

maciejkula commented 2 years ago

As always, @patrickorlando has the right answer. The key here is passing only the features you need (here, only the user features) into your model, not all features.

maciejkula commented 2 years ago

I'm going to close this, but please re-open if this doesn't solve the issue for you.

xiaoyaoyang commented 2 years ago

@patrickorlando Thanks! I will give it a try... (The inputs of my query_model and candidate_model are indeed different, and the input I feed into those two models contain all information (query model's features + candidate model's features).. )

cory1219 commented 2 years ago

Hello @patrickorlando, can you also how to change tensor variable shape from (1,0) in the dict input to (None,) to meet the requirement of saved model's input? Thanks!

patrickorlando commented 2 years ago

Hey @cory1219, You don't need to change the shape. If you pass a tensor with shape (n, m) to your model, the call signature will accept shape (None, m).

cory1219 commented 2 years ago

Hi @patrickorlando , thanks for your reply!! But I wonder if my input of the loaded model is like this:

input_data = { 'user_id': np.array([['6']]), 'product_name': np.array([["Apple"]]), }

Why can't model accept the shape (1,1) of each feature? Isn't it equal to the shape (None, 1) that the call signature accepts?


patrickorlando commented 2 years ago

Hey @cory1219, It should work. Are you getting an error for the example above? Please post the error message here if you have one and it should be easier to give a specific answer.

cory1219 commented 2 years ago

Hi @patrickorlando

I have attached the error message as below. Can I change the format of input that the model can accept? Thanks!

WARNING:absl:Found untraced functions such as ranking_35_layer_call_fn, ranking_35_layer_call_and_return_conditional_losses, retrieval_33_layer_call_fn, retrieval_33_layer_call_and_return_conditional_losses, ranking_35_layer_call_fn while saving (showing 5 of 15). These functions will not be directly callable after loading. INFO:tensorflow:Assets written to: C:\Users\z004f16b\AppData\Local\Temp\tmpjv06r_pc\ranking\assets INFO:tensorflow:Assets written to: C:\Users\z004f16b\AppData\Local\Temp\tmpjv06r_pc\ranking\assets

ValueError Traceback (most recent call last) ~\AppData\Local\Temp/ipykernel_23596/1485082039.py in 64 ) 65 ---> 66 save_ranking(model)

~\AppData\Local\Temp/ipykernel_23596/1485082039.py in save_ranking(model) 52 # Pass a customer id in, get top predicted product back. 53 print( ---> 54 loaded({ 55 "customer_id": tf.constant(np.array([["0001019648"]])), 56 "customer_price_group": tf.constant(np.array([["KK-nicht verwenden"]])),

~\Anaconda3\lib\site-packages\tensorflow\python\saved_model\load.py in _call_attribute(instance, *args, kwargs) 699 700 def _call_attribute(instance, *args, *kwargs): --> 701 return instance.call(args, kwargs) 702 703

~\Anaconda3\lib\site-packages\tensorflow\python\util\traceback_utils.py in error_handler(*args, **kwargs) 151 except Exception as e: 152 filtered_tb = _process_traceback_frames(e.traceback) --> 153 raise e.with_traceback(filtered_tb) from None 154 finally: 155 del filtered_tb

~\Anaconda3\lib\site-packages\tensorflow\python\saved_model\function_deserialization.py in restored_function_body(*args, **kwargs) 287 "Option {}:\n {}\n Keyword arguments: {}" 288 .format(index + 1, _pretty_format_positional(positional), keyword)) --> 289 raise ValueError( 290 "Could not find matching concrete function to call loaded from the " 291 f"SavedModel. Got:\n {_pretty_format_positional(args)}\n Keyword "

ValueError: Could not find matching concrete function to call loaded from the SavedModel. Got: Positional arguments (2 total):

Option 1: Positional arguments (2 total):

Option 2: Positional arguments (2 total):

Option 3: Positional arguments (2 total):

Option 4: Positional arguments (2 total):

patrickorlando commented 2 years ago

Hey @cory1219, Looks like you have a rank mismatch. Your model expects tensors of shape (None,), which has rank 1, but you are passing in tensors of shape (1, 1) which has rank 2. Your input should tensors should have shape (1,). More concretely your model expects

{'product_id': ['abc'], ...}

and you are passing

{'product_id': [['abc']], ...}

Hope this helps.

cory1219 commented 2 years ago

Hi @patrickorlando,

Thank you so much for your help!! Now the prediction of the loaded ranking model finally works! But when I tried to predict using the loaded retrieval model, it showed a totally different error message. Can you also help me with that? Thank you!! My code:

def save_retrieval(model):

# Create a BruteForce layer as before for prediction
index = tfrs.layers.factorized_top_k.BruteForce(model.query_model)
#index = tfrs.layers.factorized_top_k.ScaNN(model.query_model)
    tf.data.Dataset.zip((items.batch(100), items.batch(100).map(model.candidate_model)))

# Export the query model.
with tempfile.TemporaryDirectory() as tmp:
    path = os.path.join(tmp, "retrieval")

    # Save the index.

    # Load it back; can also be done in TensorFlow Serving.
    loaded = tf.saved_model.load(path)
    time = datetime.datetime.strptime("2021", '%Y')
    time_input = datetime.datetime.timestamp(time)

    # Pass a customer id in, get top predicted product id back.
    scores, titles = loaded({
        "customer_id": np.array(["0001019648"]), 
        "customer_price_group": np.array(["KK-nicht verwenden"]), 
        "customer_type": np.array(["END-ACCOUNT"]),
        "customer_industry": np.array(["Power Utilities"]),       
        "companyname_gu": np.array(["Zweckverband kommunaler Anteilseigner der WEMAG"]),        
        "project_flag": np.array([0]),        
        "timestamp": np.array([time_input]),

    print(f"Recommendations: {titles[0][:3]}")


The error message shows as follows:

WARNING:tensorflow:Skipping full serialization of Keras layer <tensorflow_recommenders.layers.factorized_top_k.BruteForce object at 0x000001BF160BBAF0>, because it is not built. WARNING:tensorflow:Skipping full serialization of Keras layer <tensorflow_recommenders.layers.factorized_top_k.BruteForce object at 0x000001BF160BBAF0>, because it is not built. WARNING:absl:Found untraced functions such as query_with_exclusions while saving (showing 1 of 1). These functions will not be directly callable after loading. INFO:tensorflow:Assets written to: C:\Users\z004f16b\AppData\Local\Temp\tmpz47edw4c\retrieval\assets INFO:tensorflow:Assets written to: C:\Users\z004f16b\AppData\Local\Temp\tmpz47edw4c\retrieval\assets

TypeError Traceback (most recent call last) ~\AppData\Local\Temp/ipykernel_23596/1368832069.py in 70 71 #save_ranking(model) ---> 72 save_retrieval(model)

~\AppData\Local\Temp/ipykernel_23596/1368832069.py in save_retrieval(model) 25 26 # Pass a customer id in, get top predicted product id back. ---> 27 scores, titles = loaded({ 28 "customer_id": np.array(["0001019648"]), 29 "customer_price_group": np.array(["KK-nicht verwenden"]),

TypeError: '_UserObject' object is not callable

patrickorlando commented 2 years ago

Glad to help @cory1219.

Before you can serialise the model, you need to build it. This happens automatically when you call the model with data. You just need to call your brute force index with an example record before you save it.

cory1219 commented 2 years ago

Thanks for your reply @patrickorlando

But I would like to confirm your statement "call your brute force index with an example record before you save it." It is equal to my implementation of the aforementioned code line? My code: index.index_from_dataset( tf.data.Dataset.zip((items.batch(100), items.batch(100).map(model.candidate_model))) )

patrickorlando commented 2 years ago

I mean getting a prediction when I say call the model. After you index the brute_force layer, you need to run.

scores, identifiers = index(example_record)