lyst / lightfm

A Python implementation of LightFM, a hybrid recommendation algorithm.
Apache License 2.0
4.73k stars 687 forks source link

Learning to rank using explicit preferences / rankings? #269

Closed conortwo closed 6 years ago

conortwo commented 6 years ago

Hi @maciejkula , thank you for your fantastic library and examples as I have found them really helping in learning about implicit recommender systems. I just have a few questions that you can hopefully weigh in on.

I am an undergraduate CS student currently working on a project which involves eliciting user preferences using a game with purpose. (quite similar to the game described in this paper by Hacker and Von Ahn) In short users are paired up and asked to indicate which of two movies (gathered from the MovieLens dataset) they would prefer while their partner is tasked with predicting their decision, and both players take turns expressing preferences and making predictions. I have started to look at implementing a recommender system (based on the RelativeSVD algorithm described in the paper above) which works on data gathered from the game and have built one in python which performs gradient descent using (i,j,k) triplets indicating user i preferred item i to item j in the game and therefore trying to ensure item j is rated higher than item k in the (num_users x num_items) predicted score matrix.

I came across LightFM and noticed some similarities in this methodology and the implicit recommender built into LightFM which makes use of BPR and WARP loss, and would like to build and evaluate a LightFM model to work on the game data for comparison.

  1. Would it be possible to adapt the LightFM model to work on explicit (positive_item, negative_item) pairings? I assume this would require changes to the underlying Cython responsible for implementing the BPR and WARP loss functions which may be non trivial. I also came across Spotlight when researching this which may offer a slightly more friendly way of customizing the sampling if I wanted to attempt to implement this myself?

Another idea I'm experimenting with is having users build up personal rankings of movies and having a reccomender work on these. For example imagine we have 10 movies with ids 0-9 and the following partial ordering for a user: [1,4,5,6,8] . Movie 1 has 4 movies ranked below it so it gets a value of 4, movie 4 has 3 below so it's value is 3 etc. down to movie 8 with a score of 0 (along with all other unordered movies) Ideally I would like movies with higher values to appear higher up in the predicted scores and I'm using a ndcg@k evaulation to check this (with relevance being a scaled version of each movies interaction value)

  1. Will LightFM work well with these sort of values in the interaction matrix? I notice the linked example which works on sketchfab models treats interactions as likes which would be somewhat similar however a lot of the current evaluation seems to be focused on positive examples being non-zero and negative examples being zero (e.g binary interactions), but for this data it's important that a higher positive value is placed higher above smaller positive values. Based on issue #260 I believed that using sample_weight would improve things and yet from what I can see it makes no discernible affect on the outcome of training the model. Curiously, there seems to be no difference when using these positive interaction values which decrease alongside rank versus binary interactions with 1s indicating which items appear in the users partial ranking. This makes it difficult to optimize for ndcg@k. I'm wondering if perhaps I'm using the sample_weight argument incorrectly here? For the small example above movies [1,4,5,6,8] would have an interaction value of 1 and weights of [4,3,2,1,0] respectively. The idea being that ideally we would like to preserve as much of the original partial ranking in our predictions while generating a total ranking with which to make recommendations.

Apologies for the long post, any advice or suggestions with how to handle this data using LightFM (or another library if more suitable for the task) would be appreciated. If anything is unclear feel free to ask and I'll be happy to clarify. Thanks!

maciejkula commented 6 years ago

Thanks for the detailed question!

Constructing explicit hierarchies of preference is actually a question I'm quite often asked. Unfortunately, you are correct in thinking that this would required fairly complicated changes to the Cython code in LightFM: while changing sample_weights is an approximation to this, it does not attempt to preserve the rankings you have. (And, as you say, may be of limited effectiveness: it's possible that changing relative weights even more to weigh top movies even more heavily will work, but I am not fully convinced.)

I think that capturing the relationships you have will require a substantially different model that will be hard to express in LightFM. Instead, I would recommend something more flexible, like Spotlight or wyrm (my new deep learning library, with an example of building a recommender here). I would be happy to provide pointers in both cases.

maciejkula commented 6 years ago

Hope this helped. Let me know if you have any further questions!