00:00 Welcome
02:15 What is 'Bus Number'?
03:00 Follow along - Tutorial Github Repo
03:30 Fixing nbdime installation issue
07:45 Who needs reproducible data science?
08:46 Meet the Bjørn persona
10:16 Meet the Mark persona
11:25 Meet the Annie persona
12:35 The case for reproducibility : we all need it
13:10 How do you spend your 'Data Science' time?
15:46 Use the right tools for the job
16:18 Tool #1 : Revision control (git and github/gitlab/bitbucket)
18:00 Tool #2 : Language (Python 3.6+)
20:13 Tool #3 : Virtual Environments and package managers (conda)
21:35 Tool #4 : Frameworks (Scikit-learn, joblib)
22:32 Tool #5 : IDE (Jupyter notebook)
22:57 Tool #6 : Scripting (Makefiles)
23:15 Tool #7 : Templates (cookiecutter)
24:10 Getting started with the Jupyter notebooks
25:10 Jupyter Notebook naming conventions
27:06 Tip: Tag your code when you do something significant
27:39 Configuring a new cookiecutter-easydata project
29:10 Creating a development environment
28:23 Licences
29:53 Data science is a DAG
31:20 The reproducible flow diagram
34:10 Break
36:00 Q&A : Why Makefiles over Shell scripts?
40:04 Raw Data is Read-Only
43:18 Munging Bjørn's phoneme data
44:44 Tip : Don't hardcode paths. Use pathlib
47:55 Building a 'RawDataset' object
53:36 Building a 'Dataset' object
1:01:05 The workflow module
1:08:33 Where does the 'src' module come from?
1:11:15 The next steps
Video URL : https://www.youtube.com/watch?v=x7gukmVdAxw
Contents
00:00 Welcome 02:15 What is 'Bus Number'? 03:00 Follow along - Tutorial Github Repo 03:30 Fixing nbdime installation issue 07:45 Who needs reproducible data science? 08:46 Meet the Bjørn persona 10:16 Meet the Mark persona 11:25 Meet the Annie persona 12:35 The case for reproducibility : we all need it 13:10 How do you spend your 'Data Science' time? 15:46 Use the right tools for the job 16:18 Tool #1 : Revision control (git and github/gitlab/bitbucket) 18:00 Tool #2 : Language (Python 3.6+) 20:13 Tool #3 : Virtual Environments and package managers (conda) 21:35 Tool #4 : Frameworks (Scikit-learn, joblib) 22:32 Tool #5 : IDE (Jupyter notebook) 22:57 Tool #6 : Scripting (Makefiles) 23:15 Tool #7 : Templates (cookiecutter) 24:10 Getting started with the Jupyter notebooks 25:10 Jupyter Notebook naming conventions 27:06 Tip: Tag your code when you do something significant 27:39 Configuring a new cookiecutter-easydata project 29:10 Creating a development environment 28:23 Licences 29:53 Data science is a DAG 31:20 The reproducible flow diagram 34:10 Break 36:00 Q&A : Why Makefiles over Shell scripts? 40:04 Raw Data is Read-Only 43:18 Munging Bjørn's phoneme data 44:44 Tip : Don't hardcode paths. Use pathlib 47:55 Building a 'RawDataset' object 53:36 Building a 'Dataset' object 1:01:05 The workflow module 1:08:33 Where does the 'src' module come from? 1:11:15 The next steps