A recommender systems handbook

ndey96 commented 6 years ago

Read the relevant sections of the recommender systems handbook

MatthewMcLeod commented 6 years ago

Chapter 8: Evaluating Recommender Systems 3 types of recommender systems evaluations. Ratings, Usage and Ranking

Rating Prediction: Predict what user will rate. Metrics: RMSE, MSE, weighted MSE, etc. All follow some sort of cost function that is like Sum(predicted_rating-actual_rating)
Usage Prediction Assumption that if it has not been consumed, that then if recommended, would not be consumed. Take a set of actions, hide a portion, and predict if user would interact with it. This creates the Precision-Recall tradeoff and ROC curves to analyze. Use F-measure to balance precision and recall.
Ranking Prediction Predict the order to put recommendations for the user. 2 methods for evaluating Reference Based: Establish reference list of ranking and compute ‘accuracy’ using things like Spearman’s or Kendall (look em up if you want). Establishing reference list is tricky since it must be inferred. Can do funky set theory things. Utility Based: Assign some power law weighting of recommendations and maximize utility function. Easier to understand.

MatthewMcLeod commented 6 years ago

Chapter 13: Music Recommendations

Unique Because:

Smaller amount of time to consume media
Consumed multiple times (unlike other domains)
More layers of abstraction (genres, artists, albums,etc)
Implicit feedback is a lot more common (not common to rate songs).
Songs may keep playing because they are good or user walked away. Bad songs are stopped.
Due to the sparsity of user-song relationship, content based techniques generally perform better
Preferences are more influenced by environmental states
Need serial recommendations (to play back to back). What one recommendation is effects the quality of recommendation of the next one.

MatthewMcLeod commented 6 years ago

Chapter 26: Novelty and Diversity in Recommendations

Diversity can help in situations where there is uncertainty (don't put your eggs in one basket)

Metrics for Diversity

Average Intra-List Distance The average distances between all items in a set of recommendations. A distance function needs to be defined. May or may not use the distance function used in content based filtering.
Global Long-Tail Novelty popularity is the probability that a random person would know of the item Thus, a novelty in recommendations is evaluated by the sum of the popularity of the content.
User Specific Unexpectedness Similar to Average Intra List Distance but instead of looking at a recommendation set and look at the distance between recommendations in the set, compute the distance of each item in the recommendations with every piece of content the user has interacted with. the unexpectedness for each piece of content is the total sum of the distances of everything else the user has consumed.
InterRecommendation Diversity Rather than user specific evaluation of recommendation diversity, evaluate diversity based on aggregate recommendations from many users. Simplest, look at the number of unique suggestions for all users. More sophisticated, calculate GINI coefficients for distributions of content recommended (look up wealth inequality to understand GINI coefficients).

MatthewMcLeod commented 6 years ago

Proposed Definition of Diversity and Novelty:

Novelty: The unexpectedness of a single content piece (ex: Global Tail Novelty is evaluating novelty) Diversity: The intradifferences and interdifferences between sets of recommended content (ex: Average Intra List Diversity and Interrecommendation diversity are evaluating diversity).

ndey96 commented 6 years ago

Closing since @MatthewMcLeod read this

MatthewMcLeod commented 6 years ago

Diversity and Novelty Enhancements

Approaches can be summarized by two approaches.

Rerank/reorder recommendations to improve diversity metrics (post processing).
Embed diversity metrics into actual objective and cost functions.

There is usually trade-off between diversity and accuracy. One can do reorder where only items evaluated as the same "chance" will be reordered to improve diversity. This preserves accuracy but is not as flexible for increasing diversity.

Clustering Method Cluster user actions in a number of clusters, and then rather than make recommendations off of entire user history, perform recommendation on each cluster.

Fusion Based Methods Aggregate different recommendation systems outputs. Aggregation is hopefully more diverse.

In user studies, can trick the users into thinking recommendations are more diverse by reordering the set that they see.

ndey96 / rec-sys

A recommender systems handbook #5