Contents
0:06 Talk Introduction
1:45 Parallelization use cases
3:15 Use Case 1: No dependencies across data or analysis
4:19 Use Case 2: Model scoring on a per-record basis
4:57 Parallelization Anti-Example: ML model learning and Training
6:01 Multithreading is not the same as multiprocessing
8:48 Key differences
9:42 Cores, CPUs, and computer memory
12:06 Use top to monitor processes
13:02 Multiprocessing suits more use cases and is used by joblib
14:28 Example ML workflow
16:18 Example Pre-processing: function vs joblib
18:57 Joblib hyperparameter tuning: job and chunk size
20:39 Writing a wrapper function for joblib
21:18 Calling joblib and the number of physical cores
22:22 joblib.Parallel and joblib.delayed
23:48 Results: timing
25:05 Results: data
25:57 Brief overview of GBM clasifier hyperparameter tuning
26:50 Joblib passes large Numpy arrays by reference and avoids data duplication
27:52 Avoid writing to overlapping segments in memory
28:08 Avoid multiprocessing calls to external servers
28:36 Other tips and tricks: first see how runtime scales, avoid crashing jobs by increasing number of tasks, and be aware that complex records can cause CPU spikes
29:39 Resources
Video URL: https://www.youtube.com/watch?v=bzdMHXDusOQ&list=WL&index=335
Contents 0:06 Talk Introduction 1:45 Parallelization use cases 3:15 Use Case 1: No dependencies across data or analysis 4:19 Use Case 2: Model scoring on a per-record basis 4:57 Parallelization Anti-Example: ML model learning and Training 6:01 Multithreading is not the same as multiprocessing 8:48 Key differences 9:42 Cores, CPUs, and computer memory 12:06 Use top to monitor processes 13:02 Multiprocessing suits more use cases and is used by joblib 14:28 Example ML workflow 16:18 Example Pre-processing: function vs joblib 18:57 Joblib hyperparameter tuning: job and chunk size 20:39 Writing a wrapper function for joblib 21:18 Calling joblib and the number of physical cores 22:22 joblib.Parallel and joblib.delayed 23:48 Results: timing 25:05 Results: data 25:57 Brief overview of GBM clasifier hyperparameter tuning 26:50 Joblib passes large Numpy arrays by reference and avoids data duplication 27:52 Avoid writing to overlapping segments in memory 28:08 Avoid multiprocessing calls to external servers 28:36 Other tips and tricks: first see how runtime scales, avoid crashing jobs by increasing number of tasks, and be aware that complex records can cause CPU spikes 29:39 Resources