How to know better evaluation metrics for Retrieval or Rank will translate into better actual recommendations in the case of MovieLens dataset?

tensorflow / recommenders

TensorFlow Recommenders is a library for building recommender system models using TensorFlow.

Apache License 2.0

1.82k stars 273 forks source link

We know that Retrieval is trained with in-batch negative sampling, which is to take other users' positive samples as the current user's negative samples, so this is an approximation of the true negative samples. Rank is trained with mse to predict the ratings. All these metrics are not direct measurement of movie recommendations. But it is the movie recommendations that really matter. I am not sure if there is a possibility for: the training metrics are improving, but the actual movie recommendations are worsening.

tensorflow / recommenders

How to know better evaluation metrics for Retrieval or Rank will translate into better actual recommendations in the case of MovieLens dataset? #691