Closed tianpunchh closed 3 years ago
It's the order that matters, not the actual value.
I.e. is the order after sorting of predicted items correct?
I understand in most applications of recommender systems, the order of the values is the main concern. However, the actual value is also quite important for many usage scenarios. For example, if you want to cross two users and want to understand their relative preferences. And if you want to deal with some cold start problem, you may want to directly get into the user or item matrix and do some math there. Anyway, the value is so called affinity score and if you present that value, the first question would be what does this value stand for, is it normalized?
Anyway, I think the value itself is important to understand. Since the training target is 0/1, if the model predicted value is directly trained in terms of 0/1, I believe it should be distributed between 0 and 1 more or less, and the value can be interpreted as an expected interaction strength. However, it is not in lightfm. That's why I ask about how to understand this value, is the fit() wrapped up with an activation function like sigmoid but the predict() function removes it? for example, if the following is the algo behind the scene that makes a lot of sense why predict() is not within the range of 0 to 1 fit: sigmoid(E_userE_item + bias) --> 0/1 interaction matrix predict: E_userE_item + bias
Since the training target is 0/1, if the model predicted value is directly trained in terms of 0/1
This is not the traning target if you're using BPR or WARP, which is what gives you the best results for ranking order.
Cool, thank you. I guess I had poor understanding about those learning-to-rank loss functions. I looked at the original WARP paper, it seems that the loss function is actually defined to be
Loss = log(N-1/X)*(f_u(i')+1-f_u(i))
in which N is the total number of items, X is the sampling times until one negative sample is found, and f is the value of a (user, item) pair. The value of f_u(i) seems to be THE value that I am talking about, and you are right, it tries to find optimal f's that make the rank as correct as possible.
Nevertheless, my feeling is that learning-to-rank algo tries to optimize the rank only while it has nothing to do with the original interaction strength. This is a pity because eventually f_u(i) is hard to interpret itself and lose direct relationship with interaction strength. Any comment here?
These are not meant to be interpretable: few model predictions are.
You can incorporate the strength of interactions through model weights. This has the effect of increasing the magnitude of the gradients for stronger interactions, leading the model to fit them more closely than interactions of smaller strength.
If we just get a rank rather than a score between 0 and 1, how can we calculate the AUC scores which need the probabilities of the predictions?
I’m closing this issue because it has been inactive for a long time. If you still encounter the problem, please open a new issue.
Thank you!
As far as I understand, lightfm takes in an interaction matrix which consists of 0s and 1s. Since this is the training target, what I expect for predict() is that it should give rise to results roughly within a range mostly between 0 and 1 because the training is trying to minimize the difference between prediction and the target. i.e. fit: E_user*E_item+bias ==> target 0/1
However, predict() gives me a range from, in my case, like -4 to 4. Is sigmoid normalization already included when doing the training while predict by default does not take sigmoid activation, I mean
fit: sigmoid(E_user*E_item+bias) ==> target 0/1