The Netflix Recommender System: Algorithms, Business Value, and Innovation

Summary:

main source of revenue is a subscription service that allows members to stream videos
their recommender system is not one algorithm, but a collection of different algorithms which serve different use cases
humans are surprisingly bad at choosing between many options, quickly getting overwhelmed and choosing none of the above or making poor choices
A typical Netflix user loses interest after 60-90 seconds and reviews 10-20 titles (3 in detail) on 1-2 screens
80% of video hours come from RS, 20% from search
Lots of good references to follow for dimensionality reduction, matrix factorization, probabilistic graphical models in section 2.9
Different RS Algorithms
- Personalized Video Ranker (PVR) algorithm
  - orders a list of movies for a user in a personalized way
  - this list can be filtered by genre or other filtering
- Top-N Video Ranker
  - produces recommendations shown in the top picks row
  - finds top N videos across entire catalogue
  - metrics only consider the top N in this case
  - more general than PVR
- Trending now
  - uses short term temporal trends, ranging from a few minutes to days
  - good for seasonal recommendations
- Continue Watching
  - rank content that the user should continue watching
  - unfinished content and episodic content
- Video-Video Similarity
  - “Because you watched” (BYW)
  - unpersonalized, uses similarity metrics based on content features (director, actors, genre, etc.)
- Page Generation: Row selection and ranking
  - before 2015 the homepage rows were fixed
  - now they are personalized using a fancy algorithm on their tech blog (https://medium.com/netflix-techblog/learning-a-personalized-homepage-aa8ec670359a)
- They also display the predicted rating for each movie
Business Value
- effective catalog size (ECS) describes how spread viewing is across items in catalog
  - personalization shrinks ECS
- personalization reduces churn rates and saves more than 1B per year for Netflix
Improving Algorithms
- Ultimately, revenue is proportional to the number of members, rate of member acquisition, churn rate, and former members rejoining
- “we think that maximizing revenue through product changes is fairly equivalent to maximizing the value that our members derive from our service”
- “we have observed that improving engagement—the time that our members spend viewing Netflix content—is strongly correlated with improving retention”
- A/B testing a new RS, there are 3 main metrics
  - we see members engaging more with the part of the product that was changed (a local engagement metrics win), more with the Netflix product overall (an overall engagement win), and higher retention rates (a clear overall win)
  - if local engagement metrics don’t improve, but global metrics do, the test is usually repeated
- Offline Evaluation
  - use past data to evaluate different algorithms
  - use rankings that have been made in the past
  - DRAWBACK: offline evaluation assumes that users would have behaved in the same way
    - They have found offline evaluation not to be highly predictive of A/B test success
    - Rarely use offline evaluation
Key Open Problems
- Better experimentation protocols
  - want better offline experimentation that is more predictive of A/B test outcomes
- Balance between recommending movies and tv shows
  - tv shows have longer runtime, movies are one-session but increase novel plays/hour
- Global algorithms challenges
  - consider what languages a user understands
- Controlling for presentation bias (OUR PROBLEM)
  - strong positive feedback loop where videos that members engage highly with are recommended to many members, leading to high engagement with those videos and so on
  - most of their statistical models do not take this feedback loop into account
  - In Netflix’s opinion, it is very likely that algorithms accounting for the videos that were actually recommended to users, in addition to the outcome of each recommendation, will solve this problem
  - subproblem: finding clusters of users that respond similarly to different recommendations
  - subproblem: finding effective ways to introduce randomness into recommendations and learn better models
- Page construction
  - find some way to personalize the very layout of the page
- Member coldstarting
  - hard to give accurate recommendations for new users with little data
  - churn rates are highest for new users (< 1 month)
- Account Sharing
  - how to share viewing data between family profiles when multiple members are viewing
  - how to personalize best for multiple users at once
- Find the best evidence to support each recommendation
  - maybe some users care about awards, certain directors, genres, etc

ndey96 / rec-sys

The Netflix Recommender System: Algorithms, Business Value, and Innovation #8