index the retrival model using multiple data

naarkhoo commented 2 years ago

in the manual there is only

# Create a model that takes in raw query features, and
index = tfrs.layers.factorized_top_k.BruteForce(model.user_model)
# recommends movies out of the entire movies dataset.
index.index_from_dataset(
  [tf.data.Dataset.zip((movies.batch(100), movies.batch(100).map(model.movie_model))), tf.data.Dataset.zip((movies.batch(100), movies.batch(100).map(model.movie_model)))]
)

which index the Retrival model from dataset. I know the index object has property of index so when I try

index.index(movies.batch(100).map(model.movie_model))

I get the following error

AttributeError: 'MapDataset' object has no attribute 'shape'

which mirrors what is expected in the code here

my input to index which is movies.batch(100).map(model.movie_model) is tensorflow.python.data.ops.dataset_ops.MapDataset and I am using TF 2.8.0 in a colab environment.

In fact my question is how I can index my retrival model using multiple input -> 100 movies users have clicked, 100 very recent movies in the market, 100 movies each users friends have considered ... seems the input must be a list.

maciejkula commented 2 years ago

Have you tried using the index_from_dataset method?

naarkhoo commented 2 years ago

Thanks, yes and that works, but that means, in a production setup where I have preselected 1000 candidates for each user, I should write them in file,index and rank ?

On Fri, Jun 3, 2022 at 11:07 PM Maciej Kula @.***> wrote:

Have you tried using the index_from_dataset https://www.tensorflow.org/recommenders/api_docs/python/tfrs/layers/factorized_top_k/BruteForce#index_from_dataset method?

— Reply to this email directly, view it on GitHub https://github.com/tensorflow/recommenders/issues/493#issuecomment-1146363278, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABWWO34ZLAPVDZS7YXGJ73VNJX2ZANCNFSM5WZMCB5Q . You are receiving this because you authored the thread.Message ID: @.***>

patrickorlando commented 2 years ago

@naarkhoo if you have a bounded number of candidates that is different for each user then you don't really want a retrieval index. You would just pass your candidates to your model with query input and do the matrix multiplication. You essentially can skip the retrieval stage and go straight to ranking stage.

tensorflow / recommenders

index the retrival model using multiple data #493