lyst / lightfm

A Python implementation of LightFM, a hybrid recommendation algorithm.
Apache License 2.0
4.73k stars 687 forks source link

How to use LightFM for user-file recommendations #242

Closed jimmychen623 closed 6 years ago

jimmychen623 commented 6 years ago

Hi, This not really an issue but just a request for advice. A start-up I am working for is trying to create a recommender system for recommending files to users. We only have implicit feedback (when and how many times a file is accessed by a user). I had a couple of questions regarding how we can use LightFM for our problem.

  1. If we do not supply features for users/items and just use the traditional matrix factorization, what is the upper limit in sparsity that we can afford in our user-items matrix?
  2. After fitting the model, how can we compare closeness between users and how can we compare closeness between items?
  3. How can we update/store the model for updating when new users/new items/ new interactions come in?

I'd appreciate any advice regarding this. I am a beginner at building recommendation systems. Thank you!

maciejkula commented 6 years ago

No problem.

  1. This is somewhat difficult to answer, as it also depends on how uniform the distribution is (you may be able to do some thing with a sparse dataset if it has denser regions). I'd say that something 10 times sparser than Movielens should still be manageable; 100 times sparse may be a challenge.
  2. You can use cosine similarity between the embeddings. This article has an example of how to do it with LightFM.
  3. The most robust answer is that you periodically recompute the model from scratch to handle all three. For just new interactions, you can run additional fitting iterations on the same model with the new data. (You can persist the model by pickling it.) Adding new users/items (this is called fold-in) is somewhat tricky, and not explicitly supported. For models that naturally take in new information (and new users), you should have a look at sequence-based models.
jimmychen623 commented 6 years ago

Thank you for the advice. One problem I am trying to tackle is how to generate recommendations for a user that are also similar to a particular item. An idea that comes to mind is that I could generate a set of recommendations for a user, then generate a set of similar items to the target item, and take the intersection of that set, but that seems kind of hacky.

Is there a good way to achieve this?

maciejkula commented 6 years ago

One thing you could do is rank items by a weighed average of the user-item recommendation score and the item-item similarity score.

If you do that, you may want to make sure to normalize the user-item recommendation scores appropriately, for example by transforming them into percentiles. User-item scores out of the LightFM model do not have a guaranteed range or scale (as that is irrelevant for ranking items for any particular user).