tensorflow / recommenders

TensorFlow Recommenders is a library for building recommender system models using TensorFlow.
Apache License 2.0
1.83k stars 274 forks source link

Correct data formulation when using both User and Item features (question) #125

Closed stevejpapad closed 3 years ago

stevejpapad commented 3 years ago

I am trying to build an emotion-aware recommendation system for music by adapting the TFRS tutorials on my data's needs but I have a little trouble understanding the required formulation of the data for the Item-Side features.

More specifically, I treat the "current emotion" values similarly to the tutorial's "timestamps". A dynamic, 'contextual' user-side feature that may vary with each interaction. Emotions are mapped in the Interaction matrix (tutorial's ratings matrix) :

interactions = tf_data.map(lambda x: {
    "user_name": x["user_name"],
    "album_name": x["album_name"],
    "user_emotion": x["user_emotion"]})

the values are preprocessed in the User_Model and used in the Query Embeddings in the 'compute_loss' function. This part seems to work properly.

My trouble is with adding Item Features. I have created a tensor dataset for music features (each Unique Track, its Genre etc):

albums = tf_album_data.map(lambda x: {
    "album_name": x["album_name"],
    "music_genres": x["music_genres"]})

which are preprocessed in the Music_Model (similar to the Movie_Model) and then used for the Candidate_Model in 'compute_loss' function. Using this structure, where the user and music features are separate, i get a KeyError: "music_genres". It does not recognize the item features.

Only when i add the "music_genres" in the Interaction (ratings) matrix the error disappears but i can not verify if that is the correct formulation. Is the a aforementioned architecture okay or should the user and item features be kept seperate and thus the error may lie elsewhere?

Thank you very much in advance!

maciejkula commented 3 years ago

Your primary training dataset (interactions) should contain all features for both the user and the item for that interaction. For example, in your case, every single row will contain the user name, the user emotion, the album name, and the music genres.

The separate candidate dataset used for evaluation and prediction will contain data on the albums only (so the album name and the genres).

Does this make sense?

stevejpapad commented 3 years ago

It makes total sense. Thank you so much for the clarification and the quick response!

Keep up the amazing work!