Low AUC while using the model with user features.

mayank1806 commented 7 years ago

I am trying to get predictions using LightFM. I have user feature,item feature and user item interactions as the input. When I am running my model with user item interactions and item features as input , AUC is coming out to be 0.84 but when I input the user features as well, the AUC gets reduced to 0.58. I am giving user alpha as 1e-4 and item features as 1e-6. I am getting same result with both the alphas being 1e-6.

tfzxyinhao commented 7 years ago

I think you need to use fit_partial train for more times and show your changing of accuracy by matplotlib and then adjust the params there is tutorial http://lyst.github.io/lightfm/docs/examples/warp_loss.html

maciejkula commented 7 years ago

Bear in mind that if you don't pass features in, the model assumes one indicator feature per user/item. If you do pass them in, the model takes features as provided: so if your user features do not contain per-user features your resulting model may well be less expressive.

mayank1806 commented 7 years ago

Thanks @maciejkula .. Does it mean that my ideal user feature matrix should be a square one? My current user feature matrix has around 14000 users and some 15-16 features. Is there any way I can calculate RMSE of my predictions in lightFM?

maciejkula commented 7 years ago

You could make it rectangular with 14015-14016 columns by horizontally stacking your features matrix with an identity matrix.

RMSE isn't really the right metric for evaluating recommender systems. You should use ranking metrics like AUC, MRR, or nDCG.

mayank1806 commented 7 years ago

Thanks @maciejkula Even after horizontally stacking my feature matrix with an identity matrix, I am not getting good AUC for test set. Though I am getting a great AUC of 0.92 with the train set. Now, I have three doubts: 1) My model has ratings spread between 1 and 5. Is this a problem? Should my model have binary ratings only? 2) Does the score generated by your predict module can be normalized to get the rating between 1 to 5? 3) What kind of train and test does this model expect?

mayank1806 commented 7 years ago

There was a problem with my train and test set. Now I am getting perfect results. AUC of 0.94 for train and 0.82 for test. Thanks for building such a wonderful capability. I just wanted to know that whether this model is scalable or not. Will this model perform better with very large dataset? Thanks in advance

maciejkula commented 7 years ago

Great!

The model should perform well with any reasonably sized dataset, easily scaling to 20-40 million interactions.

mayank1806 commented 7 years ago

Hi! I have around 2500 users and 970 movies. I want to store all the recommendation scores in an array. I am using model.predict command to do that but getting error: assert len(user_ids) == len(item_ids)

AssertionError I can understand that the length of my arrays are not same but what is the way out? How can I get recommendation scores for all the user-movie combination for which I have non-zero interaction in my dataset.

maciejkula commented 7 years ago

You can supply arrays of user and item ids that reflect all the user-item pairs you have in your dataset. If you are using COO matrices, these will be the row and col arrays.

You may want to have a look at the documentation or docstrings

mayank1806 commented 7 years ago

Thanks @maciejkula

What I felt unique about this model that it supports partial_fit. This can have a very good business implication as it will take less time to train the model if the train set is changed slightly. Can you give some good reference to understand this functionality?

maciejkula commented 7 years ago

You simply pass a new training matrix into partial_fit, and model training will resume on the new data from the state earlier training stopped.

mayank1806 commented 7 years ago

What I can understand is: Suppose I have a model of 2000 users and 1000 movies. A new user joins and thus I have 2001 rows now in user-item interaction as well as user features matrix. Should I pass that one row only (matrix of 1X1000) to the partial_fit which will be zero since he is a new user or should I append that row with my original matrix and then pass it to the model. Same doubt with User feature matrix.

My understanding says that I should pass that row appended with original matrix. It will be very kind of you if you can clear this confusion.

maciejkula commented 7 years ago

There was some discussion about this here.

The gist of it is that you should preallocate dimensions for however many items/users you expect in the future. You can then pass in matrices which have data on users you had no data on before.

The matrices must always have the same dimensions, since users and items are identified by their row and column indices.

mayank1806 commented 7 years ago

Hi @maciejkula .. Some questions about the model: 1) How are you using the sample weight in calculating score for user-item pair? My interaction data is in the form of some kind of counts like duration of viewing,visit etc. I have assigned a score to each interaction based on some maths on those numbers. I just wanted to know that how does the model takes into consideration the weight of each interactions? 2) Suppose there is a user who has watched two movies M1 and M2. I have scores for those interactions. May be 2 and 1 over a scale of 5. Will this model consider 1 as negative interaction and 2 as positive? 3) Since you are using sigmoid function to calculate the scores, it should come between 0 and 1 but we are getting huge scores. How are we getting those scores?

Sorry for long questions but I am using this model for sometimes and really need to know these before I conclude on some results. Thanks in advance

maciejkula commented 7 years ago

The sample weight affects the size of the gradient step taken for a given sample during fitting.
This is an implicit feedback model (for BPR and WARP losses). 1s are positive, 0s are negative. You should not be using ratings.
The sigmoid function is not applied during prediction. Only the ranking and not the actual scale of the scores matters.

lyst / lightfm

Low AUC while using the model with user features. #180