recommenders / rival

RiVal recommender system evaluation toolkit
rival.recommenders.net
Apache License 2.0
150 stars 40 forks source link

Ranking metric data model not generic #87

Closed aleSuglia closed 9 years ago

aleSuglia commented 9 years ago

Good morning, in order to evaluate my content-based recommender systems I've decided to use your evaluation framework. In the Javadoc I've seen that there is a generic DataModel which can store all the information about the training set and the test set. With it, I can easily construct the model according to the data that I have.

In my experimental protocol, I'll use ranking metrics and I've seen that Rival perfectly supports them. According to the documentation for metrics like Precision and Recall, I must use a DataModel<Long, Long> for training data and the groundtruth.

My question is: is there a specific reason to prefer a specific DataModel to a generic one in the metrics' constructor?

Unfortunately, in my dataset, I have identifiers that are strings so I cannot satisfy the interface restrictions in the metrics' constructor.

P.S. I've implemented my proposal and you can find it in my last commit: (https://github.com/aleSuglia/rival/commit/e0b7cd7c948fcc4aba616487fdac0b38f1284bac). It satisfies the tests that you've written for the evaluation module so I think that everything is ok.

Thank you in advance for your answer. Alessandro Suglia

abellogin commented 9 years ago

Hi Alessandro, thank you for your comments. Yes, indeed the evaluation metrics should not depend on a specific DataModel, so your proposal is very appropriate.

I think the reason why it was generic at the moment is because we were planning to do a whole generalization refactoring, not only for evaluation.

Cheers, Alejandro

aleSuglia commented 9 years ago

You're welcome, I really appreciate your work because I think that using a single, universal platform for evaluation is the best way to obtain good results in the field of recommender systems.

Do you think that I can use the code that I've written to run my experimentation without any problems? I will use only the precision and recall metrics in order to do top-n evaluation for my algorithms.

abellogin commented 9 years ago

Thanks for your words!

Yes, I think these metrics should work. If you want to contribute with a test case where item ids are Strings, it will be great.

aleSuglia commented 9 years ago

I'll do it in these days because I'll run my experimentation. It's a pleasure for me to contribute to this project. Thank you!

EDIT: I've implemented them in my last commit: https://github.com/aleSuglia/rival/commit/bf77fc0fd38045144c3e797156b15faa8238066d I've implemented a string based data model for the Precision test program. The code is equal for all the others so I think that is quite useless to duplicate code.

If there is something else to add I will be happy to help you.

alansaid commented 9 years ago

Hi @aleSuglia, Great to see this work being done. When you feel you're done with your additions, please make a pull request and we'll try to merge in you additions to RiVal.

abellogin commented 9 years ago

Fixed by PR #88