Questions on loss functions and ltr

lspataro26 commented 3 years ago

Hi everybody, first of all thank you for the great library and @maciejkula in particular for your amazing previous work (Lightfm, Spotlight, etc.) I have a few questions about loss functions and ltr that confuses me a bit.

I am going to start from some statements that you could obviously rebut:

Generally what we want from recommenders is to return the most relevant elements at the top, so ranking metrics are best suited to assess the quality of the recommendations: NDCG, MRR, precision@k, etc. (even though we could have a candidate selection phase optimising for recall and a ranking phase optimising for a ranking metric)
Loss functions that are rank aware are then preferable (pairwise/listwise over pointwise loss functions).

Questions: I know that LightFM Warp loss is not applicable in the mini batch setting (and I heard about WMRB that is an extension to WARP suitable for that) and I wanted to understand how the current loss of the library works and if generally works better than WARP.

I always wondered what is the relationship between learning to rank losses and recommenders' losses. For example in ltr there are losses that try to optimise for the ranking metric directly, example LambdaLoss (tensorflow ranking) or YetiRank (catboost, even though it has been developed for a tree based algorithm), instead for recommenders we have the various BPR, WARP, etc. Are ltr losses applicable in two towers recommenders (and recommenders' losses applicable in standard ltr search)? If yes why it seems to exist a distinctions between rank aware losses for recommenders and for ltr?

Thank you so much and looking forward to your reply!

maciejkula commented 3 years ago

LTR losses are applicable in two tower recommenders. You could try many of the losses from TensorFlow Ranking in the TFRS Retrieval task, and experiment!

My expectation is that the results wouldn't be dramatically better than the default softmax loss used by the Retrieval task. In the minibatch setting softmax is really hard to beat, and I am somewhat skeptical of the value of dedicated ranking losses.

Have a look at this paper for an overview; they find that softmax/normalized softmax often beats dedicated metric learning approaches.

lspataro26 commented 3 years ago

Thank you @maciejkula for your answer and for the resource you shared, it makes sense! What about the Ranking part of the model? As I can see from the tutorials there are examples about using RMSE to predict ratings but often ratings are a proxy for the real target that is to provide a ranked list of relevant items. In addition to that often we have more implicit data (binary or not) where we might not have negatives. In such instances would you train the DNN with the same softmax loss (approaching the problem as a classification task) or would you suggest another loss specifically designed for ranking? Thank you!

fdnavarropecci commented 3 years ago

@lspataro26 , did you end up trying a different loss for the Ranking part of the model? I'm facing a similar situation. Thanks!

lspataro26 commented 3 years ago

@fdnavarropecci No, I am sorry I didn't. My question was more for learning purposes. I didn't have the chance to work on that yet :)

tensorflow / recommenders

Questions on loss functions and ltr #250