Add more metrics - Githubissues

maciejkula commented 7 years ago

maciejkula commented 7 years ago

mokarakaya commented 6 years ago

Hello @maciejkula ,

First, thank you very much for implementing Spotlight. I am planning to use Spotlight for my further study.

I'd like to contribute by implementing AUC. Here is my plan for implementation with some questions;

We can create a precision-recall curve (axises are precision and recall) or we can create ROC curve (axises are true positive rate to false positive rate) (See ref1). I think the precision-recall curve is fine. What is your opinion?
We will use different k values (number of recommended items) to produce different points of the curve.
New evaluation metric will return a single result since AUC is a reduced result of curve graphs. I see that results (e.g. precision and recall) in Spotlight are generally arrays rather than single results. Do you think AUC metric should return an array or a single result?
We can calculate the area under the curve by using the Trapezoidal rule or Simpson's rule. By default, the metric will calculate the area by using Trapezoidal rule. Simpson's rule will be optional.

Do you think the plan is ok for implementation? Please let me know your comments.

Ref1 - Recommender Systems Handbook 2nd edition - 8.3.2.2 Measuring Usage Prediction

ps: we need to hit x=1 and y=1 values, since this metric is generally used to compare multiple algorithms.

nikisix commented 6 years ago

@mkarakaya, You should contribute your idea! Here are my thoughts having helped out in the past on spotlight evaluation metrics:

Yes prec-recall is fine
Of course
I asked the same question, and @maciejkula's response was as you guess -- an array. Personally, I would not be opposed to a single result however, in the case that integrating tons of AUCs were slow and could be sped up somehow by aggregating first.
Sure

Also, have you considered a confusion matrix or at least F1-score?

Good luck!

maciejkula / spotlight