tensorflow / recommenders

TensorFlow Recommenders is a library for building recommender system models using TensorFlow.
Apache License 2.0
1.84k stars 277 forks source link

How to pass multiple inputs into the BruteForce index? #171

Open italodamato opened 3 years ago

italodamato commented 3 years ago

I'm trying to use the index on my model which requires a dict as inputs (similar to [https://www.tensorflow.org/recommenders/examples/deep_recommenders#combined_model](this tutorial)). How do I input a query correctly when I call the index? I'm getting multiple errors.

This is my last attempt:


index = tfrs.layers.factorized_top_k.BruteForce(model.query_model)
# recommends products out of the entire products dataset.
index.index(train.batch(100).map(model.candidate_model), 
            train.batch(100).map(lambda x: model.candidate_model({'product_id': x['product_id']})))

test_query = {
        "last_100_product_views": train_df["last_100_product_views"][0],
        "last_100_purchases": train_df["last_100_purchases"][0],
        "last_100_searches": train_df["last_100_searches"][0],
        "user_gender": train_df["user_gender"][0],
        "user_country": train_df["user_country"][0],
    }
# Get recommendations.
_, titles = index(dict(train_df[['last_100_product_views',
                                        'last_100_purchases',
                                        'last_100_searches',
                                        'user_gender',
                                        'user_country',]].iloc[0]))

print(f"Recommendations for user 42: {titles[0, :3]}")```
maciejkula commented 3 years ago

It's really hard to say anything without seeing the errors. Would you mind putting together a Colab that reproduces the issue?

In general:

  1. Make sure that what you pass to index is exactly the same as what you pass to the query model during training.
  2. In particular, when you pass a dict, make sure its entries are tensors or numpy arrays. I don't know what Pandas returns.
italodamato commented 3 years ago

Worked! Thanks a lot! I had to:

  1. Make sure every value of the dict was a tensor (np arrays didn't work).
  2. Add one dimension to each tensor to emulate the batch size (some layers like GlobalAveragePooling1D fail if the input is not 3D)

image

italodamato commented 3 years ago

Is there a way to avoid getting back the same candidate multiple times? I know I can filter it out afterward but wondering why is this even happening? Am I doing something wrong?

image

italodamato commented 3 years ago

Worked! Thanks a lot! I had to:

  1. Make sure every value of the dict was a tensor (np arrays didn't work).
  2. Add one dimension to each tensor to emulate the batch size (some layers like GlobalAveragePooling1D fail if the input is not 3D)

image

Actually, I'm trying to use it as input for scann but it gives me an error saying it doesn't accept dicts, just tensors. Any solution? image

# get query from dataset in the apporpiate format for brute_force
def get_test_query(df, df_row):
  return dict(df.loc[df_row,['last_100_product_views',
                                        'last_100_purchases',
                                        'last_100_searches',
                                        'user_gender',
                                        'user_country',
                          ]].map(lambda x: tf.expand_dims(x, axis=0)))

EDIT: Solved it with:

_, titles = scann(model.query_model(get_test_query(train_df, row_n)), n_of_recommendations)

print(f"Top recommendations: {titles[0]}")

Not sure it's the proper way.

Edit2: Actually, I think the proper way is to pass the query model in the scann layer itself.

I'm not sure why I'm not getting any speed benefits compared to brute force though: image image

Edit3: I got Scann to work faster after I adapted my model to be successfully saved and re-loaded.

maciejkula commented 3 years ago

To avoid duplicates you should pass a deduplicated dataset into index, containing each candidate you can recommend only once.

It looks like you're passing in the train dataset, which presumably contains many, many instances of the same candidate.

italodamato commented 3 years ago

Thanks, it makes sense. Wondering also if there's a way to pass the k elements to retrieve after the model has been loaded. I get an error if I try to change it.

Without k, it works: image

With k, I get an error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-171-ebc22dbbb800> in <module>()
      6                   ], 1)
      7 
----> 8 _, titles = loaded(query, 10)
      9 print(query)
     10 print('\n')

14 frames
/usr/local/lib/python3.6/dist-packages/tensorflow/python/saved_model/function_deserialization.py in restored_function_body(*args, **kwargs)
    255         .format(_pretty_format_positional(args), kwargs,
    256                 len(saved_function.concrete_functions),
--> 257                 "\n\n".join(signature_descriptions)))
    258 
    259   concrete_function_objects = []

ValueError: Could not find matching function to call loaded from the SavedModel. Got:
  Positional arguments (3 total):
    * {'last_100_product_views': <tf.Tensor 'queries:0' shape=(1,) dtype=string>, 'last_100_purchases': <tf.Tensor 'queries_1:0' shape=(1,) dtype=string>, 'last_100_searches': <tf.Tensor 'queries_2:0' shape=(1,) dtype=string>, 'user_gender': <tf.Tensor 'queries_4:0' shape=(1,) dtype=string>, 'user_country': <tf.Tensor 'queries_3:0' shape=(1,) dtype=string>}
    * 10
    * False
  Keyword arguments: {}

Expected these arguments to match one of the following 4 option(s):

Option 1:
  Positional arguments (3 total):
    * {'last_100_purchases': TensorSpec(shape=(None,), dtype=tf.string, name='last_100_purchases'), 'last_100_product_views': TensorSpec(shape=(None,), dtype=tf.string, name='last_100_product_views'), 'user_gender': TensorSpec(shape=(None,), dtype=tf.string, name='user_gender'), 'last_100_searches': TensorSpec(shape=(None,), dtype=tf.string, name='last_100_searches'), 'user_country': TensorSpec(shape=(None,), dtype=tf.string, name='user_country')}
    * None
    * False
  Keyword arguments: {}

Option 2:
  Positional arguments (3 total):
    * {'last_100_purchases': TensorSpec(shape=(None,), dtype=tf.string, name='queries/last_100_purchases'), 'last_100_product_views': TensorSpec(shape=(None,), dtype=tf.string, name='queries/last_100_product_views'), 'user_gender': TensorSpec(shape=(None,), dtype=tf.string, name='queries/user_gender'), 'last_100_searches': TensorSpec(shape=(None,), dtype=tf.string, name='queries/last_100_searches'), 'user_country': TensorSpec(shape=(None,), dtype=tf.string, name='queries/user_country')}
    * None
    * False
  Keyword arguments: {}

Option 3:
  Positional arguments (3 total):
    * {'last_100_purchases': TensorSpec(shape=(None,), dtype=tf.string, name='last_100_purchases'), 'last_100_product_views': TensorSpec(shape=(None,), dtype=tf.string, name='last_100_product_views'), 'user_gender': TensorSpec(shape=(None,), dtype=tf.string, name='user_gender'), 'last_100_searches': TensorSpec(shape=(None,), dtype=tf.string, name='last_100_searches'), 'user_country': TensorSpec(shape=(None,), dtype=tf.string, name='user_country')}
    * None
    * True
  Keyword arguments: {}

Option 4:
  Positional arguments (3 total):
    * {'last_100_purchases': TensorSpec(shape=(None,), dtype=tf.string, name='queries/last_100_purchases'), 'last_100_product_views': TensorSpec(shape=(None,), dtype=tf.string, name='queries/last_100_product_views'), 'user_gender': TensorSpec(shape=(None,), dtype=tf.string, name='queries/user_gender'), 'last_100_searches': TensorSpec(shape=(None,), dtype=tf.string, name='queries/last_100_searches'), 'user_country': TensorSpec(shape=(None,), dtype=tf.string, name='queries/user_country')}
    * None
    * True
  Keyword arguments: {}
maciejkula commented 3 years ago

I think this is just a function of how SavedModels work; you'll need multiple signatures for this to work as you expect.

Have a look at the SavedModel guide for details.