lyst / lightfm

A Python implementation of LightFM, a hybrid recommendation algorithm.
Apache License 2.0
4.77k stars 691 forks source link

Predict using user features #210

Open longlnnm opened 7 years ago

longlnnm commented 7 years ago

Hello, I would like to ask how we can predict using user features instead of user ids. Because having user ids means that lightFM needs to train with that user first. However, if I use user features, I can use training data of user with similar features, and recommend similar items.

Both predict and predict_rank have require use ids. Is there any way I can use user features to predict instead? predict(user_ids, item_ids, item_features=None, user_features=None, num_threads=1) predict_rank(test_interactions, train_interactions=None, item_features=None, user_features=None, num_threads=1)

maciejkula commented 7 years ago

Can you have a look at the updated documentation for user and item features?

The gist of it is that user and item ids are simply indices into rows of their feature matrices: if you supply feature matrices that have features, you are using features, and the user ids are only a means of telling the model which row of the features matrix it should use.

dofine commented 7 years ago

If my understanding is right, the docs say it's better to provide a userid-userid identity matrix concatenated with feature matrix. So anyhow the overall user feature matrix would have a unique userid in every row? Say two users 1 and 2 have two same features, age and height. Given a new user with same age and height to them, what will the model predict? We feed the new age and height, in your reply, which row should the new user feature be in? Or this doesn't matter at all, any row is OK, because their features are exactly the same.

maciejkula commented 7 years ago
  1. It is usually better to provide it, because then the model will be more expressive: it will be able to express preference for every individual user. But sometimes this is counterproductive (data is too sparse) or impossible: you want to always predict for new users for whom you have no historical data (cold start).
  2. Given the same features, predictions will be the same. It doesn't matter which row it's in: all that matters are the features in the row. If you use the identity matrix, this is of course impossible: every row is different.

Hope this helps!

longlnnm commented 7 years ago

My problem is more of an implementation problem.

Supposing if I have a NEW USER D, and D is does not have any user interaction data. Now if I have trained with user A, B, C and have interactions with item 1 to 9: A1, A2, A3, B4, B5, B6, C7, C8, C9. Now say D's has similar user features to A's (age, gender, job...). My goal is to recommend NEW ITEMS 11, 12, 13 with similar features to items 1, 2, 3 from user A. So this is a problem with new user being recommended new items with similar user, and item features.

What item_features and user_features do I pass to the predict method if I want to predict : predict(D.id, item_ids=?, item_features=?, user_features=?)

Is it correct if I make this function: predict(0, item_ids=[11,12,13], item_features=[11's features, 12's features, 13's features], user_features=[D.features])

maciejkula commented 7 years ago

You're close. Remember, user and item ids are indices into their respective feature matrices. In this case, your item matrix has three rows, so you'd need to do:

predict(0, item_ids=[0, 1, 2], item_features=[11's features, 12's features, 13's features], user_features=[D.features])
longlnnm commented 7 years ago

Thank you very much. That explains a lot

kurokochin commented 6 years ago

I have questions about this problem also:

  1. How do we know new user D is similiar with A?
  2. If we want to add new user D in models, do we need to re-train our model from scratch?

Thanks! :)

mm27368 commented 4 years ago

@maciejkula I have a few questions to ask. Please help me with these.

  1. In Movie prediction, for predicting recommendations for a new user :- In model.fit(), I pass user_features as concatenated (identity matrix and feature matrix). But for predicting for a new user , We should use model.predict(0, np.arange(n_items) , user_features=user feature matrix of shape (1, len(features)) Here, the user_feature passed in model.predict will be of length of user features but the user_features passed in model.fit() are of length (length of features and identity matrix) . Can you please tell if this is the correct way ?

  2. If the User/item features given like 'age' : 1-90, 'location' : usa, singapore, italy etc. How do I convert them into binary to create a feature matrix ?

  3. I need to perform recommendation on a very small data (users : 5-6 , items: 50-60) and I am not getting good results for such small data . Can you please suggest what can be the minimum users , items , minimum interactions per user , minimum interactions per item, and minimum number of user/item features for a decent result. ?

freytheviking commented 4 years ago

You're close. Remember, user and item ids are indices into their respective feature matrices. In this case, your item matrix has three rows, so you'd need to do:

predict(0, item_ids=[0, 1, 2], item_features=[11's features, 12's features, 13's features], user_features=[D.features])

@maciejkula, is there a special reason you used 0 for your user_ids? If this is a cold start problem for users, shouldn't you need to add a new index?

parhamfh commented 3 years ago

@freytheviking, the id must correspond to an index (row) in the user_features (CSR) matrix passed in to predict (which has num_features columns). So if you construct the user_features matrix to only contain 1 row it will correctly select that row.

If the id does not correspond to a row you will receive the exception *** Exception: Number of user feature rows does not equal the number of users

@mm27368 regarding your second point, there is a Dataset helper class for doing that. Have a look at the class documentation and this tutorial