lyst / lightfm

A Python implementation of LightFM, a hybrid recommendation algorithm.
Apache License 2.0
4.73k stars 688 forks source link

why predict() gives me results not between 0 and 1 while interaction matrix consists of 0/1 only #454

Closed tianpunchh closed 3 years ago

tianpunchh commented 5 years ago

As far as I understand, lightfm takes in an interaction matrix which consists of 0s and 1s. Since this is the training target, what I expect for predict() is that it should give rise to results roughly within a range mostly between 0 and 1 because the training is trying to minimize the difference between prediction and the target. i.e. fit: E_user*E_item+bias ==> target 0/1

However, predict() gives me a range from, in my case, like -4 to 4. Is sigmoid normalization already included when doing the training while predict by default does not take sigmoid activation, I mean

fit: sigmoid(E_user*E_item+bias) ==> target 0/1

impaktor commented 5 years ago

It's the order that matters, not the actual value.

I.e. is the order after sorting of predicted items correct?

tianpunchh commented 5 years ago

I understand in most applications of recommender systems, the order of the values is the main concern. However, the actual value is also quite important for many usage scenarios. For example, if you want to cross two users and want to understand their relative preferences. And if you want to deal with some cold start problem, you may want to directly get into the user or item matrix and do some math there. Anyway, the value is so called affinity score and if you present that value, the first question would be what does this value stand for, is it normalized?

Anyway, I think the value itself is important to understand. Since the training target is 0/1, if the model predicted value is directly trained in terms of 0/1, I believe it should be distributed between 0 and 1 more or less, and the value can be interpreted as an expected interaction strength. However, it is not in lightfm. That's why I ask about how to understand this value, is the fit() wrapped up with an activation function like sigmoid but the predict() function removes it? for example, if the following is the algo behind the scene that makes a lot of sense why predict() is not within the range of 0 to 1 fit: sigmoid(E_userE_item + bias) --> 0/1 interaction matrix predict: E_userE_item + bias

impaktor commented 5 years ago

Since the training target is 0/1, if the model predicted value is directly trained in terms of 0/1

This is not the traning target if you're using BPR or WARP, which is what gives you the best results for ranking order.

tianpunchh commented 5 years ago

Cool, thank you. I guess I had poor understanding about those learning-to-rank loss functions. I looked at the original WARP paper, it seems that the loss function is actually defined to be

Loss = log(N-1/X)*(f_u(i')+1-f_u(i))

in which N is the total number of items, X is the sampling times until one negative sample is found, and f is the value of a (user, item) pair. The value of f_u(i) seems to be THE value that I am talking about, and you are right, it tries to find optimal f's that make the rank as correct as possible.

Nevertheless, my feeling is that learning-to-rank algo tries to optimize the rank only while it has nothing to do with the original interaction strength. This is a pity because eventually f_u(i) is hard to interpret itself and lose direct relationship with interaction strength. Any comment here?

impaktor commented 5 years ago

Any comment here?

Not sure, but I think the paper on Neural collaborative filter pdf / code might be used for that purpose, but not sure.

Good luck!

maciejkula commented 5 years ago

These are not meant to be interpretable: few model predictions are.

You can incorporate the strength of interactions through model weights. This has the effect of increasing the magnitude of the gradients for stronger interactions, leading the model to fit them more closely than interactions of smaller strength.

ZhCoding commented 5 years ago

If we just get a rank rather than a score between 0 and 1, how can we calculate the AUC scores which need the probabilities of the predictions?

SimonCW commented 3 years ago

I’m closing this issue because it has been inactive for a long time. If you still encounter the problem, please open a new issue.

Thank you!