Open maciejkula opened 7 years ago
Hello @maciejkula ,
First, thank you very much for implementing Spotlight. I am planning to use Spotlight for my further study.
I'd like to contribute by implementing AUC. Here is my plan for implementation with some questions;
We can create a precision-recall curve (axises are precision and recall) or we can create ROC curve (axises are true positive rate to false positive rate) (See ref1). I think the precision-recall curve is fine. What is your opinion?
We will use different k values (number of recommended items) to produce different points of the curve.
New evaluation metric will return a single result since AUC is a reduced result of curve graphs. I see that results (e.g. precision and recall) in Spotlight are generally arrays rather than single results. Do you think AUC metric should return an array or a single result?
We can calculate the area under the curve by using the Trapezoidal rule or Simpson's rule. By default, the metric will calculate the area by using Trapezoidal rule. Simpson's rule will be optional.
Trapezoidal rule: https://docs.scipy.org/doc/numpy/reference/generated/numpy.trapz.html Simpson's rule: https://docs.scipy.org/doc/scipy/reference/generated/scipy.integrate.simps.html#scipy.integrate.simps
Do you think the plan is ok for implementation? Please let me know your comments.
Ref1 - Recommender Systems Handbook 2nd edition - 8.3.2.2 Measuring Usage Prediction
ps: we need to hit x=1 and y=1 values, since this metric is generally used to compare multiple algorithms.
@mkarakaya, You should contribute your idea! Here are my thoughts having helped out in the past on spotlight evaluation metrics:
Also, have you considered a confusion matrix or at least F1-score?
Good luck!