by Bradley Boehmke
:spiral_calendar: January 27 and 28, 2020
:alarm_clock: 09:00 - 17:00
:hotel: Ballroom Level, Imperial A
:writing_hand: rstd.io/conf
This two-day workshop introduces the essential concepts of building deep learning models with TensorFlow and Keras via R. Throughout this workshop you will gain an intuitive understanding of the architectures and engines that make up deep learning models, apply a variety of deep learning algorithms (i.e. MLPs, CNNs, RNNs, LSTMs, collaborative filtering), understand when and how to tune the various hyperparameters, and be able to interpret model results. You will have the opportunity to apply practical applications covering a variety of tasks such as computer vision, natural language processing, product recommendation and more. Leaving this workshop, you should have a firm grasp of deep learning and be able to implement a systematic approach for producing high quality modeling results.
Is this workshop for you? If you answer "yes" to these three questions, then this workshop is likely a good fit:
Are you relatively new to the field of deep learning and neural networks but eager to learn? Or maybe you have applied a basic feedforward neural network but aren’t familiar with the other deep learning frameworks?
Are you an experienced R user comfortable with the tidyverse, creating functions, and applying control (i.e. if, ifelse) and iteration (i.e. for, while) statements?
Are you familiar with the machine learning process such as data splitting, feature engineering, resampling procedures (i.e. k-fold cross validation), hyperparameter tuning, and model validation?
This workshop will provide some review of these topics but coming in with some exposure will help you stay focused on the deep learning details rather than the general modeling procedure details.
I make a few assumptions of your established knowledge regarding your programming skills and machine learning familiarity (items #2-3 in the previous section). Below are my assumptions and some resources to read through to make sure you are properly prepared.
Assumptions | Resource |
---|---|
You should be familiar with the Tidyverse, control flow, and writing functions | R for Data Science |
You should be familiar with the basic concept of machine learning | Ch. 1 HOMLR |
You should be familiar with the machine learning modeling process | Ch. 2 HOMLR |
You should be familiar with the feature engineering process | Ch. 3 HOMLR |
You will require several packages and datasets throughout this workshop. If you are attending the workshop these will be preinstalled for you so you do not need to worry about your OS differing from mine. However, after you leave the workshop, the first notebook below will allow you to reproduce the work you did in the workshop. Also, at the conference workshop, we will all use the RStudio Cloud platform. The second notebook below will get you set up so that we can hit the ground running on day 1!
Description | Resource |
---|---|
Pre-installing necessary packages and datasets (already pre-installed for workshop!) | Instructions Source Code |
Setting up RStudio Cloud environment | Instructions |
This workshop is notebook-focused. Consequently, most of our time will be spent in R notebooks; however, I will also jump to slides to explain certain concepts in further detail. Throughout the notebooks, you will see ℹ️ icons that will hyperlink to relevant slides (or additional resources).
Time | Activity | Notebook | Source Code | Other |
---|---|---|---|---|
09:00 - 09:30 | Introduction | Slides | ||
09:30 - 10:30 | Deep learning ingredients | Notebook | .Rmd | Slides |
10:30 - 11:00 | Coffee break | |||
11:00 - 12:30 | Deep learning recipe | |||
Training your model | Notebook | .Rmd | ||
Mini-project: Predicting Ames, IA home sales prices | Notebook | .Rmd | Solution | |
12:30 - 13:30 | Lunch break | |||
13:30 - 15:00 | Computer vision & CNNs | |||
MNIST revisted | Notebook | .Rmd | Slides | |
Cats vs dogs | Notebook | .Rmd | Slides | |
Transfer learning | Notebook | .Rmd | Slides | |
15:00 - 15:30 | Coffee break | |||
15:30 - 17:00 | Project: Classifying natural images | Notebook | .Rmd | Solution |
Time | Activity | Notebook | Source Code | Other |
---|---|---|---|---|
09:00 - 10:30 | Word embeddings | |||
The original IMDB | Notebook | .Rmd | Slides | |
Pre-trained embeddings | Notebook | .Rmd | Slides | |
Mini project - Amazon reviews | Notebook | .Rmd | Solution | |
10:30 - 11:00 | Coffee break | |||
11:00 - 12:30 | Collaborative filtering | Notebook | .Rmd | Excel file |
12:30 - 13:30 | Lunch break | |||
13:30 - 15:00 | RNNs & LSTMs | |||
IMDB revisted | Notebook | .Rmd | Slides | |
Mini project - Non-IMDB reviews | Notebook | .Rmd | Solution | |
15:00 - 15:30 | Coffee break | |||
15:30 - 17:00 | Wrap up | |||
Project: Detecting Duplicate Quora Questions | Notebook | .Rmd | Solution | |
Final words of wisdom | Slides |
Activity | Notebook | Source Code |
---|---|---|
Improving generalization with k-fold cross validation | Notebook | .Rmd |
Performing a grid search | Notebook | .Rmd |
Linear regression with stochastic gradient descent | Notebook | .Rmd |
Diagnosing model performance with learning curves | Notebook | .Rmd |
Save your models for later with serialization | Notebook | .Rmd |
Visualizing what CNNs learn | Notebook | .Rmd |
Brad Boehmke is a Director of Data science at 84.51° where he wears both software developer and machine learning engineer hats. His team focuses on developing algorithmic processes, solutions, and tools that enable 84.51° and its data scientists to efficiently extract insights from data and provide solution alternatives to decision-makers. He is a visiting professor at the University of Cincinnati, author of the Hands-on Machine Learning with R and Data Wrangling with R books, creator of multiple public and private enterprise R packages, and developer of various data science educational content. You can learn more about his work, and connect with him, at bradleyboehmke.github.io.
Rick Scavetta
Omayma Said
Doug Ashton
Daniel Rodriguez
This work is licensed under a Creative Commons Attribution 4.0 International License.