tensorflow / recommenders

TensorFlow Recommenders is a library for building recommender system models using TensorFlow.
Apache License 2.0
1.82k stars 273 forks source link

How to know better evaluation metrics for Retrieval or Rank will translate into better actual recommendations in the case of MovieLens dataset? #691

Open houghtonweihu opened 1 year ago

houghtonweihu commented 1 year ago

In the tutorials of Tensorflow Recommenders, top_k_categorical_accuracy is used for the evaluation of Retrieval, and mse for Rank. Do we have examples that show better evaluation metrics translate into better movie recommendations in the case of MovieLens dataset?

houghtonweihu commented 1 year ago

We know that Retrieval is trained with in-batch negative sampling, which is to take other users' positive samples as the current user's negative samples, so this is an approximation of the true negative samples. Rank is trained with mse to predict the ratings. All these metrics are not direct measurement of movie recommendations. But it is the movie recommendations that really matter. I am not sure if there is a possibility for: the training metrics are improving, but the actual movie recommendations are worsening.