tensorflow / recommenders

TensorFlow Recommenders is a library for building recommender system models using TensorFlow.
Apache License 2.0
1.84k stars 275 forks source link

Does TFRS support item similarity? #302

Open dgoldenberg-audiomack opened 3 years ago

dgoldenberg-audiomack commented 3 years ago

Question: does TFRS support item similarity i.e. recommending similar / related items? I mean item-to-item recommendations, as in, recommend items that are most similar to an item you specify when you get recommendations.

I don't seem to see this in the framework. If it's not available, I'd like this to be an enhancement request to get it added to TFRS.

maciejkula commented 3 years ago

Just like ranking, this is one of the basic goals of TFRS. Have you seen https://www.tensorflow.org/recommenders/examples/basic_retrieval?

maciejkula commented 3 years ago

To add a bit of detail: this is a special case of retrieval described in the tutorials - it's just that your query is an item and not a user.

jafaircl commented 3 years ago

Leaving this comment to help others out as I'm sure I'm not the only one who didn't "get it" straight away:

If you go through the linked tutorial, getting the similar movies is as simple as creating an index that takes the movie model as an input instead of the user model. For example, you can add this to the bottom of your code from the tutorial and get a list of similar movies to Toy Story:

# Create a model that takes in raw query features, and
index = tfrs.layers.factorized_top_k.BruteForce(model.movie_model)
# recommends movies out of the entire movies dataset.
index.index(movies.batch(100).map(model.movie_model), movies)

# Get recommendations.
_, titles = index(tf.constant(["Toy Story (1995)"]))
print(f"Similar movies to Toy Story (1993): {titles[0, :10]}")
dgoldenberg-audiomack commented 3 years ago

@jafaircl @maciejkula Why not just make this explicit in the tutorials? It's a well-known, popular use-case and spelling it out in the docs explicitly would go a long way, IMO.

almirb commented 2 years ago

Leaving this comment to help others out as I'm sure I'm not the only one who didn't "get it" straight away:

If you go through the linked tutorial, getting the similar movies is as simple as creating an index that takes the movie model as an input instead of the user model. For example, you can add this to the bottom of your code from the tutorial and get a list of similar movies to Toy Story:

# Create a model that takes in raw query features, and
index = tfrs.layers.factorized_top_k.BruteForce(model.movie_model)
# recommends movies out of the entire movies dataset.
index.index(movies.batch(100).map(model.movie_model), movies)

# Get recommendations.
_, titles = index(tf.constant(["Toy Story (1995)"]))
print(f"Similar movies to Toy Story (1993): {titles[0, :10]}")

I'm confused, because the tutorial tells we need to retrain a model using "movies" in query and candidate towers. "These could be constructed from clicks on product detail pages."

You're (@jafaircl ) saying we could use the same pre-trained model and create a movie-movie index like the code above.

What approach would be the correct one? Thanks!

patrickorlando commented 2 years ago

Hey all, There isn't really a straightforward answer here, but hopefully this helps.

Both approaches are valid but might produce different results. The decision of which to use is based on your problem to solve.

The TFRS is very general, rather than thinking strictly in terms of users and movies, it's better to think in terms of query and candidates. The query is an encoding of the current content we want to recommend for and the candidates are what you want to recommend. The query could be based on the user_id, it could be based on user profile information, it could be based on previous product purchases, or free text the user entered in a search bar. When you optimise the two tower model, the candidate tower embeds the candidates into your d dimensional space, whilst the query tower attempts to encode queries into the same space such that a given query will be close to the corresponding candidates.

At the end, candidates that are both relevant for a given query will be close together in the embedding space. These items should be similar and if you were to create an index with the candidate model as query model, you would essentially be doing a Nearest Neighbour search to find similar candidates. This might work well on a carousel within a product page, but the next example shows that this is not always what you want.

Let's imagine you are trying to recommend complementary items for a shopping basket. The query will be a list of items currently in the basket. Let's assume the user has a new phone in their basket, based on the similar items method above, you would likely be recommending other phones, but that is unlikely to be relevant for the user. You might instead expect that phone accessories be recommended. To do this you cannot rely on the method mentioned above. In this case you require distinct query and candidate towers, so that the query tower can take the input of a phone and encode the query next to phone accessories in the joint embedding space.

I hope this is helpful 😁

almirb commented 2 years ago

Hello @patrickorlando !

Thanks for this awesome explanation. There is not "right" or "wrong", but It depends on the task and desired effect. We're developing a retail recsys with TFRS and It's time to discuss what "tower config" will be used in each page type (home, product, cart..).

Thanks!

garashov commented 2 years ago

Hello @patrickorlando,

Thanks for an explanation that added more understanding. But I have a question.

I use TFRS Hybrid model with user ids query tower and item ids candidate tower. I was wondering how could I pass purchase history of users to query tower?

All I want is to change our model in the way that we pass purchase history to our model and it recommends several items based on the purchase history.

Thanks.

almirb commented 2 years ago

Hi @garashov ,

AFAIK, the model will take the purchase history into account while training and will suggest similar products for users with similar profiles and similar purchase history.

Correct me please, if I'm wrong...

garashov commented 2 years ago

Hi @almirb, I think you are right.

All we want is that our model uses features, such as user_id, product_id, timestamp and etc to train. Depending how you construct towers, the configuration of your model varies.

For now I am using the model which is constructed with user and items towers in a way that we give user ids and it predicts the most appropriate products for that user; or we give a product id and it recommends the most close products for that product id.

It triggers problems when we save model and try to use it for new users. So, to avoid the problem, we decided to change approach in the way that we train the model and then we give purchase history of new user to our model and it analyses items bought by the users and recommends. I think it means that I have to move to item-to-item type of recommendation, but I am not sure even how to do it.

Would be glad to anyone's help, thanks!

almirb commented 2 years ago

I think you are thinking same way as me, look at issue #408 where I'm trying to replace user_id by the current cart items the user have and then complement the cart with extra items suggested by the model.