They also display the predicted rating for each movie
Business Value
effective catalog size (ECS) describes how spread viewing is across items in catalog
personalization shrinks ECS
personalization reduces churn rates and saves more than 1B per year for Netflix
Improving Algorithms
Ultimately, revenue is proportional to the number of members, rate of member acquisition, churn rate, and former members rejoining
“we think that maximizing revenue through product changes is fairly equivalent to maximizing the value that our members derive from our service”
“we have observed that improving engagement—the time that our members spend viewing Netflix content—is strongly correlated with improving retention”
A/B testing a new RS, there are 3 main metrics
we see members engaging more with the part of the product that was changed (a local engagement metrics win), more with the Netflix product overall (an overall engagement win), and higher retention rates (a clear overall win)
if local engagement metrics don’t improve, but global metrics do, the test is usually repeated
Offline Evaluation
use past data to evaluate different algorithms
use rankings that have been made in the past
DRAWBACK: offline evaluation assumes that users would have behaved in the same way
They have found offline evaluation not to be highly predictive of A/B test success
Rarely use offline evaluation
Key Open Problems
Better experimentation protocols
want better offline experimentation that is more predictive of A/B test outcomes
Balance between recommending movies and tv shows
tv shows have longer runtime, movies are one-session but increase novel plays/hour
Global algorithms challenges
consider what languages a user understands
Controlling for presentation bias (OUR PROBLEM)
strong positive feedback loop where videos that members engage highly with are recommended to many members, leading to high engagement with those videos and so on
most of their statistical models do not take this feedback loop into account
In Netflix’s opinion, it is very likely that algorithms accounting for the videos that were actually recommended to users, in addition to the outcome of each recommendation, will solve this problem
subproblem: finding clusters of users that respond similarly to different recommendations
subproblem: finding effective ways to introduce randomness into recommendations and learn better models
Page construction
find some way to personalize the very layout of the page
Member coldstarting
hard to give accurate recommendations for new users with little data
churn rates are highest for new users (< 1 month)
Account Sharing
how to share viewing data between family profiles when multiple members are viewing
how to personalize best for multiple users at once
Find the best evidence to support each recommendation
maybe some users care about awards, certain directors, genres, etc
http://delivery.acm.org/10.1145/2850000/2843948/a13-gomez-uribe.pdf?ip=129.97.124.100&id=2843948&acc=OA&key=FD0067F557510FFB%2E9219CF56F73DCF78%2E4D4702B0C3E38B35%2EE5B8A747884E71D5&__acm__=1540315160_e9b496193d615701d0008f599983f9ef