0:03 Speaker Introduction
0:20 Presentation Agenda
1:24 Dimensionality Reduction (DR), a brief overview
3:30 DR techniques are used as part of data preparation
4:50 Overview of the Springleaf data used throughout the presentation
5:51 Downsides of the “Kitchen Sink” approach of using the entire data set
8:57 Different DR techniques
10:02 Percent Missing Values
12:50 Amount of Variation (AOV)
13:47 Pairwise correlation
17:37 Multicollinearity
20:14 Principal Component Analysis (PCA)
23:46 Cluster analysis
26:19 Correlation with the target
27:17 Forward/Backward/Stepwise selection
30:58 LASSO
32:01 Tree-based
32:41 Proposed workflow of some DR techniques
33:09 AUC comparison of some DR techniques on Springleaf data
34:11 Summary of DR goals
34:53 Thank you; contact information
35:13 Why would you see dropping of an important feature in iterative stepwise selection?
36:43 What is the difference between LASSO and Logit (stepwise + not) on the AUC curve plot?
38:45 How sensitive is the order of the workflow?
Video URL: https://www.youtube.com/watch?v=ioXKxulmwVQ
Contents
0:03 Speaker Introduction 0:20 Presentation Agenda 1:24 Dimensionality Reduction (DR), a brief overview 3:30 DR techniques are used as part of data preparation 4:50 Overview of the Springleaf data used throughout the presentation 5:51 Downsides of the “Kitchen Sink” approach of using the entire data set 8:57 Different DR techniques 10:02 Percent Missing Values 12:50 Amount of Variation (AOV) 13:47 Pairwise correlation 17:37 Multicollinearity 20:14 Principal Component Analysis (PCA) 23:46 Cluster analysis 26:19 Correlation with the target 27:17 Forward/Backward/Stepwise selection 30:58 LASSO 32:01 Tree-based 32:41 Proposed workflow of some DR techniques 33:09 AUC comparison of some DR techniques on Springleaf data 34:11 Summary of DR goals 34:53 Thank you; contact information 35:13 Why would you see dropping of an important feature in iterative stepwise selection? 36:43 What is the difference between LASSO and Logit (stepwise + not) on the AUC curve plot? 38:45 How sensitive is the order of the workflow?