Deep Learning with Keras and TensorFlow in R

rstudio::conf 2020

by Bradley Boehmke

:spiral_calendar: January 27 and 28, 2020
:alarm_clock: 09:00 - 17:00
:hotel: Ballroom Level, Imperial A
:writing_hand: rstd.io/conf

Overview

This two-day workshop introduces the essential concepts of building deep learning models with TensorFlow and Keras via R. Throughout this workshop you will gain an intuitive understanding of the architectures and engines that make up deep learning models, apply a variety of deep learning algorithms (i.e. MLPs, CNNs, RNNs, LSTMs, collaborative filtering), understand when and how to tune the various hyperparameters, and be able to interpret model results. You will have the opportunity to apply practical applications covering a variety of tasks such as computer vision, natural language processing, product recommendation and more. Leaving this workshop, you should have a firm grasp of deep learning and be able to implement a systematic approach for producing high quality modeling results.

Is this course for me?

Is this workshop for you? If you answer "yes" to these three questions, then this workshop is likely a good fit:

Are you relatively new to the field of deep learning and neural networks but eager to learn? Or maybe you have applied a basic feedforward neural network but aren’t familiar with the other deep learning frameworks?
Are you an experienced R user comfortable with the tidyverse, creating functions, and applying control (i.e. if, ifelse) and iteration (i.e. for, while) statements?
Are you familiar with the machine learning process such as data splitting, feature engineering, resampling procedures (i.e. k-fold cross validation), hyperparameter tuning, and model validation?

This workshop will provide some review of these topics but coming in with some exposure will help you stay focused on the deep learning details rather than the general modeling procedure details.

Prework

I make a few assumptions of your established knowledge regarding your programming skills and machine learning familiarity (items #2-3 in the previous section). Below are my assumptions and some resources to read through to make sure you are properly prepared.

Assumptions	Resource
You should be familiar with the Tidyverse, control flow, and writing functions	R for Data Science
You should be familiar with the basic concept of machine learning	Ch. 1 HOMLR
You should be familiar with the machine learning modeling process	Ch. 2 HOMLR
You should be familiar with the feature engineering process	Ch. 3 HOMLR

You will require several packages and datasets throughout this workshop. If you are attending the workshop these will be preinstalled for you so you do not need to worry about your OS differing from mine. However, after you leave the workshop, the first notebook below will allow you to reproduce the work you did in the workshop. Also, at the conference workshop, we will all use the RStudio Cloud platform. The second notebook below will get you set up so that we can hit the ground running on day 1!

Description	Resource
Pre-installing necessary packages and datasets (already pre-installed for workshop!)	Instructions Source Code
Setting up RStudio Cloud environment	Instructions

Schedule

This workshop is notebook-focused. Consequently, most of our time will be spent in R notebooks; however, I will also jump to slides to explain certain concepts in further detail. Throughout the notebooks, you will see ℹ️ icons that will hyperlink to relevant slides (or additional resources).

Day 1

Time	Activity	Notebook	Source Code	Other
09:00 - 09:30	Introduction			Slides
09:30 - 10:30	Deep learning ingredients	Notebook	.Rmd	Slides
10:30 - 11:00	Coffee break
11:00 - 12:30	Deep learning recipe
	Training your model	Notebook	.Rmd
	Mini-project: Predicting Ames, IA home sales prices	Notebook	.Rmd	Solution
12:30 - 13:30	Lunch break
13:30 - 15:00	Computer vision & CNNs
	MNIST revisted	Notebook	.Rmd	Slides
	Cats vs dogs	Notebook	.Rmd	Slides
	Transfer learning	Notebook	.Rmd	Slides
15:00 - 15:30	Coffee break
15:30 - 17:00	Project: Classifying natural images	Notebook	.Rmd	Solution

Day 2

Time	Activity	Notebook	Source Code	Other
09:00 - 10:30	Word embeddings
	The original IMDB	Notebook	.Rmd	Slides
	Pre-trained embeddings	Notebook	.Rmd	Slides
	Mini project - Amazon reviews	Notebook	.Rmd	Solution
10:30 - 11:00	Coffee break
11:00 - 12:30	Collaborative filtering	Notebook	.Rmd	Excel file
12:30 - 13:30	Lunch break
13:30 - 15:00	RNNs & LSTMs
	IMDB revisted	Notebook	.Rmd	Slides
	Mini project - Non-IMDB reviews	Notebook	.Rmd	Solution
15:00 - 15:30	Coffee break
15:30 - 17:00	Wrap up
	Project: Detecting Duplicate Quora Questions	Notebook	.Rmd	Solution
	Final words of wisdom			Slides

Extras

Activity	Notebook	Source Code
Improving generalization with k-fold cross validation	Notebook	.Rmd
Performing a grid search	Notebook	.Rmd
Linear regression with stochastic gradient descent	Notebook	.Rmd
Diagnosing model performance with learning curves	Notebook	.Rmd
Save your models for later with serialization	Notebook	.Rmd
Visualizing what CNNs learn	Notebook	.Rmd

Instructor

Brad Boehmke is a Director of Data science at 84.51° where he wears both software developer and machine learning engineer hats. His team focuses on developing algorithmic processes, solutions, and tools that enable 84.51° and its data scientists to efficiently extract insights from data and provide solution alternatives to decision-makers. He is a visiting professor at the University of Cincinnati, author of the Hands-on Machine Learning with R and Data Wrangling with R books, creator of multiple public and private enterprise R packages, and developer of various data science educational content. You can learn more about his work, and connect with him, at bradleyboehmke.github.io.