solomem / DevOps

0 stars 0 forks source link

Designing Machine Learning Systems #19

Open solomem opened 2 weeks ago

solomem commented 2 weeks ago

Production having different requirements from research is one of the reasons why successful research projects might not always be used in production.

Two types of tasks: [1] a small improvement in performance can result in a huge boost in revenue or cost savings. [2] a small improvement might not be noticeable for users. If a simple model can do a reasonable job, complex models must perform significantly better to justify the complexity.

Latency refers to the time it takes from receiving a query to returning the result.
Throughput refers to how many queries are processed within a specific period of time.

Data

In production, data, if available, is a lot more messy. It’s noisy, possibly unstructured, constantly shifting. It’s likely biased, and you likely don’t know how it’s biased. Labels, if there are any, might be sparse, imbalanced, or incorrect. Changing project or business requirements might require updating some or all of your existing labels. If you work with users’ data, you’ll also have to worry about privacy and regulatory concerns.

Fairness

There’s no equivalent state of the art for fairness metrics.
Examples:
zip code, which embodies biases about one’s socioeconomic background. low ranking in resume, spelling of your name mortgage might get a higher interest rate because it relies partially on credit scores

ML algorithms don’t predict the future, but encode the past, thus perpetuating the biases in the data and more. When ML algorithms are deployed at scale, they can discriminate against people at scale

Interpretability

While most of us are comfortable with using a microwave without understanding how it works, many don’t feel the same way about AI yet, especially if that AI makes important decisions about their lives.
Interpretability isn’t just optional for most ML use cases in the industry, but a requirement.

First, interpretability is important for users, both business leaders and end users, to understand why a decision is made so that they can trust a model and detect potential biases mentioned previously.
Second, it’s important for developers to be able to debug and improve a model.

Descussion

Productionizing ML

Machine Learning Systems Versus Traditional Software

In SWE, there’s an underlying assumption that code and data are separated. In fact,
In SWE, we want to keep things as modular and separate as possible

On the contrary, ML systems are part code, part data, and part artifacts created from the two. The trend in the last decade shows that applications developed with the most/best data win. Instead of focusing on improving ML algorithms, most companies will focus on improving their data. Because data can change quickly, ML applications need to be adaptive to the changing environment, which might require faster development and deployment cycles.

In traditional SWE, you only need to focus on testing and versioning your code. With ML, we have to test and version our data too, and that’s the hard part.

Short Summary

This part highlighted the differences between ML in research and ML in production. The differences include the stakeholder involvement, computational priority, the properties of data used, the gravity of fairness issues, and the requirements for interpretability. This section is the most helpful to those coming to ML production from academia. We also discussed how ML systems differ from traditional software systems, which motivated the need for this book.

ML systems are complex, consisting of many different components. Data scientists and ML engineers working with ML systems in production will likely find that focusing only on the ML algorithms part is far from enough. It’s important to know about other aspects of the system, including the data stack, deployment, monitoring, maintenance, infrastructure, etc. This book takes a system approach to developing ML systems, which means that we’ll consider all components of a system holistically instead of just looking at ML algorithms.


Chapter 2. Introduction to Machine Learning Systems Design

Before develop a ML system, we must understand why the system is needed [1] Understand the business objectives [2] Translate into ML objectives to guide the development of ML models [3] Set out some requirements > iteratively process for designing system to meet:

first need to frame your problem into a task that ML can solve

Business and ML Objectives

Focusing on the business metrics No hacking on ML metrics The sole purpose of businesses, according to the Nobel-winning economist Milton Friedman, is to maximize profits for shareholders.

For an ML project to succeed within a business organization, it’s crucial to tie the performance of an ML system to the overall business performance.

map ML models’ performance to business metrics

Custom business metrics: for example, Netflix: take-rate, total streaming hours and subscription rate

The effect of an ML project on business objectives can be hard to reason about > better model keeps the custom happier and spend more money, but also solve the problem faster > spend less money and time on the service.

To gain a definite answer on the question of how ML metrics influce the business metrics, experiement are needed > AB testing

Reliability

The system should continue to perform the correct function at the desired level of performance even in the face of adversity (hardware or software faults, and even human error).

Scalability

the resource scaling aspect in the section “Distributed Training”, the section “Model optimization”, and the section “Resource Management”. We’ll discuss the artifact management aspect in the section “Experiment Tracking and Versioning” and the section “Development Environment”.

Maintainability

here are many people who will work on an ML system. They are ML engineers, DevOps engineers, and subject matter experts (SMEs). They might come from very different backgrounds, with very different programming languages and tools, and might own different parts of the process.

It’s important to structure your workloads and set up your infrastructure in such a way that different contributors can work using tools that they are comfortable with, instead of one group of contributors forcing their tools onto other groups. Code should be documented. Code, data, and artifacts should be versioned. Models should be sufficiently reproducible so that even when the original authors are not around, other contributors can have sufficient contexts to build on their work. When a problem occurs, different contributors should be able to work together to identify the problem and implement a solution without finger-pointing.

We’ll go more into this in the section “Team Structure”.

Adaptability

To adapt to shifting data distributions and business requirements, the system should have some capacity for both discovering aspects for performance improvement and allowing updates without service interruption.

Because ML systems are part code, part data, and data can change quickly, ML systems need to be able to evolve quickly. This is tightly linked to maintainability. We’ll discuss changing data distributions in the section “Data Distribution Shifts”, and how to continually update your model with new data in the section “Continual Learning”.

Iterative Process

image

Step 1. Project scoping

A project starts with scoping the project, laying out goals, objectives, and constraints. Stakeholders should be identified and involved. Resources should be estimated and allocated. We already discussed different stakeholders and some of the foci for ML projects in production in Chapter 1. We also already discussed how to scope an ML project in the context of a business earlier in this chapter. We’ll discuss how to organize teams to ensure the success of an ML project in Chapter 11.

Step 2. Data engineering

A vast majority of ML models today learn from data, so developing ML models starts with engineering data. In Chapter 3, we’ll discuss the fundamentals of data engineering, which covers handling data from different sources and formats. With access to raw data, we’ll want to curate training data out of it by sampling and generating labels, which is discussed in Chapter 4.

Step 3. ML model development

With the initial set of training data, we’ll need to extract features and develop initial models leveraging these features. This is the stage that requires the most ML knowledge and is most often covered in ML courses. In Chapter 5, we’ll discuss feature engineering. In Chapter 6, we’ll discuss model selection, training, and evaluation.

Step 4. Deployment

After a model is developed, it needs to be made accessible to users. Developing an ML system is like writing—you will never reach the point when your system is done. But you do reach the point when you have to put your system out there. We’ll discuss different ways to deploy an ML model in Chapter 7.

Step 5. Monitoring and continual learning

Once in production, models need to be monitored for performance decay and maintained to be adaptive to changing environments and changing requirements. This step will be discussed in Chapters 8 and 9.

Step 6. Business analysis

Model performance needs to be evaluated against business goals and analyzed to generate business insights. These insights can then be used to eliminate unproductive projects or scope out new projects. This step is closely related to the first step.

Framing ML Problems

Initial business objective > discover to frame the problem as an ML problem

Types of ML Tasks

image

References:

Mike Schuster, Melvin Johnson, and Nikhil Thorat, “Zero-Shot Translation with Google’s Multilingual Neural Machine Translation System,” Google AI Blog, November 22, 2016, https://oreil.ly/2R1CB.

2 Larry Hardesty, “A Method to Image Black Holes,” MIT News, June 6, 2016, https://oreil.ly/HpL2F.

3 I didn’t ask whether ML is sufficient because the answer is always no.

4 Patterns are different from distributions. We know the distribution of the outcomes of a fair die, but there are no patterns in the way the outcomes are generated.

5 Andrej Karpathy, “Software 2.0,” Medium, November 11, 2017, https://oreil.ly/yHZrE.

6 We’ll go over online learning in Chapter 9.

7 Steke Bako, Thijs Vogels, Brian McWilliams, Mark Meyer, Jan Novák, Alex Harvill, Pradeep Sen, Tony Derose, and Fabrice Rousselle, “Kernel-Predicting Convolutional Networks for Denoising Monte Carlo Renderings,” ACM Transactions on Graphics 36, no. 4 (2017): 97, https://oreil.ly/EeI3j; Oliver Nalbach, Elena Arabadzhiyska, Dushyant Mehta, Hans-Peter Seidel, and Tobias Ritschel, “Deep Shading: Convolutional Neural Networks for Screen-Space Shading,” arXiv, 2016, https://oreil.ly/dSspz.

8 “2020 State of Enterprise Machine Learning,” Algorithmia, 2020, https://oreil.ly/wKMZB.

9 “Average Mobile App User Acquisition Costs Worldwide from September 2018 to August 2019, by User Action and Operating System,” Statista, 2019, https://oreil.ly/2pTCH.

10 Jeff Henriksen, “Valuing Lyft Requires a Deep Look into Unit Economics,” Forbes, May 17, 2019, https://oreil.ly/VeSt4.

11 David Skok, “Startup Killer: The Cost of Customer Acquisition,” For Entrepreneurs, 2018, https://oreil.ly/L3tQ7.

12 Amy Gallo, “The Value of Keeping the Right Customers,” Harvard Business Review, October 29, 2014, https://oreil.ly/OlNkl.

13 Marty Swant, “The World’s 20 Most Valuable Brands,” Forbes, 2020, https://oreil.ly/4uS5i.

14 It’s not unusual for the ML and data science teams to be among the first to go during a company’s mass layoff, as has been reported at IBM, Uber, Airbnb. See also Sejuti Das’s analysis “How Data Scientists Are Also Susceptible to the Layoffs Amid Crisis,” Analytics India Magazine, May 21, 2020, https://oreil.ly/jobmz.

15 Wikipedia, s.v. “Ensemble learning,” https://oreil.ly/5qkgp.

16 Julia Evans, “Machine Learning Isn’t Kaggle Competitions,” 2014, https://oreil.ly/p8mZq.

17 Lauren Oakden-Rayner, “AI Competitions Don’t Produce Useful Models,” September 19, 2019, https://oreil.ly/X6RlT.

18 Kawin Ethayarajh and Dan Jurafsky, “Utility Is in the Eye of the User: A Critique of NLP Leaderboards,” EMNLP, 2020, https://oreil.ly/4Ud8P.

19 Martin Kleppmann, Designing Data-Intensive Applications (Sebastopol, CA: O’Reilly, 2017).

20 Akamai Technologies, Akamai Online Retail Performance Report: Milliseconds Are Critical, April 19, 2017, https://oreil.ly/bEtRu.

21 Lucas Bernardi, Themis Mavridis, and Pablo Estevez, “150 Successful Machine Learning Models: 6 Lessons Learned at Booking.com,” KDD ’19, August 4–8, 2019, Anchorage, AK, https://oreil.ly/G5QNA.

22 “Consumer Insights,” Think with Google, https://oreil.ly/JCp6Z.

23 Kleppmann, Designing Data-Intensive Applications.

24 Andrej Karpathy, “Building the Software 2.0 Stack,” Spark+AI Summit 2018, video, 17:54, https://oreil.ly/Z21Oz.

25 Khristopher J. Brooks, “Disparity in Home Lending Costs Minorities Millions, Researchers Find,” CBS News, November 15, 2019, https://oreil.ly/UiHUB.

26 Cathy O’Neil, Weapons of Math Destruction (New York: Crown Books, 2016).

27 Stanford University Human-Centered Artificial Intelligence (HAI), The 2019 AI Index Report, 2019, https://oreil.ly/xs8mG.

28 Tweet by Geoffrey Hinton (@geoffreyhinton), February 20, 2020, https://oreil.ly/KdfD8.

29 For certain use cases in certain countries, users have a “right to explanation”: a right to be given an explanation for an output of the algorithm.

30 Stanford HAI, The 2019 AI Index Report.

31 Xinyun Chen, Chang Liu, Bo Li, Kimberly Lu, and Dawn Song, “Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning,” arXiv, December 15, 2017, https://oreil.ly/OkAjb.

32 We’ll cover edge devices in Chapter 7.

33 Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” arXiv, October 11, 2018, https://oreil.ly/TG3ZW.

34 Google Search On, 2020, https://oreil.ly/M7YjM.