Evaluation - Githubissues

tmozgach commented 5 years ago

Paper: https://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/download/11824/11581 Let's use the hit-rate (HR) to compare our models. HR = sum over all users => ((Predicted k movies by a model) intersection (k movies from test data per a user)) / (k movies from test data per a user)

HimikoTachibana commented 5 years ago

I think those metrics look fine.

mhzhang commented 5 years ago

This metric is really bad for models that first computes a rating for each movie and then recommend the top-k movies based on the ratings (Collaborative Filtering Models). At least for my model I have a reasonable RMSE score between the predicted and actual ratings, but the accuracy measured using HR is 2/200104

mhzhang commented 5 years ago

For Collaborative Filtering Methods if we really want an accuracy score we could instead look at the ratio: Correctly Predicted Ratings / All Test Ratings

tmozgach commented 5 years ago

I read for top N items people usually use Precision, Recall and F1 to COMPARE models, we don't metrics to show that models is good. Just compare. Formula that you wrote are the same that I had wrote above.

mhzhang commented 5 years ago

The formulas share the same idea, but yours uses “correctly predicted movies”, while mine uses “correctly predicted ratings”. These two are very different. We should not try to compare models intended for different purposes. This tells nothing about the models themselves. Instead we should divide into 2 scopes: one for comparing rating-based models and one for comparing movie-based models. In the end maybe we can say something about “rating-based models are not suitable for predicting actual movies because we tried and the performance was bad” something along this line. This can be part of our finding but should not be our main focus.

tmozgach commented 5 years ago

But in the begging, we set a goal of our models: predict top N movies. Are you doing completely different things? XD

On Mon, Nov 26, 2018, 17:05 mhzhang <notifications@github.com wrote:

The formulas share the same idea, but yours uses “correctly predicted movies”, while mine uses “correctly predicted ratings”. These two are very different. We should not try to compare models intended for different purposes. This tells nothing about the models themselves. Instead we should divide into 2 scopes: one for comparing rating-based models and one for comparing movie-based models. In the end maybe we can say something about “rating-based models are not suitable for predicting actual movies because we tried and the performance was bad” something along this line. This can be part of our finding but should not be our main focus.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/tmozgach/movie_rec_sys/issues/9#issuecomment-441858697, or mute the thread https://github.com/notifications/unsubscribe-auth/AZPsBITAFhwPlmNUASdnl6z8HgXGeiA1ks5uzI_lgaJpZM4Yye1x .

mhzhang commented 5 years ago

Sure, we started with that goal, but while doing the project we discovered that some models that we picked do not work at all on predicting top K movies. We would not have known this before we started; this is the whole point of the project, to learn something. In our case we learned that it may not be a good idea to use rating prediction to predict movies. As I have said, this finding can be part of our conclusion.

tmozgach / movie_rec_sys

Evaluation #9