lyst / lightfm

A Python implementation of LightFM, a hybrid recommendation algorithm.
Apache License 2.0
4.67k stars 679 forks source link

Question building binary recommender system #552

Open brk030 opened 3 years ago

brk030 commented 3 years ago

Hello,

this question had been asked similar before but unfortunately did not answer my problem.

So firstly, the data has a unique user-id as a row and about 1000 columns (each for a product) filled with a 1 if the extra was chosen and 0 if not. Having a look at the documentation of LightFM, I found that the data can be implicit, however on the same page is written something about ratings from 1 -5 for the MovieLens dataset. If I understood everything correctly, there is not problem that my data is binary, is it?

Secondly, splitting the data into train and test, I do not completely understand what the model tests on the test set?

Thank you in advance and best regards, Brk

gcoimbra commented 3 years ago

@brk030 did you found a answer? I'm having the same question

SimonCW commented 3 years ago

Hi,

binary data is totally fine. You can use the dataset.build_interactions() method and ignore the second matrix that is returned. Movielens is about explicit data but often used as an example for implicit libraries because the dataset is so easily available.

But even implicit data doesn't need to be binary, e.g. users buy/watch/listen to a product multiple times. You should use the dataset.build_interactions() function which returns two COO Sparse matrices. The first matrix is your interactions matrix with 1s for the items a user interacted with. The second matrix are the weights constructed from non-binary implicit data.

gcoimbra commented 3 years ago

Yes I agree with @SimonCW . I was able to use lightfm to build performant recommender using binary data. If the author has no more issues, I think this should be closed.