Up your Bus Number - A Reproducible Data Science Workflow - Kjell Wooding, Amy Wooding

Video URL : https://www.youtube.com/watch?v=x7gukmVdAxw

Contents

00:00 Welcome 02:15 What is 'Bus Number'? 03:00 Follow along - Tutorial Github Repo 03:30 Fixing nbdime installation issue 07:45 Who needs reproducible data science? 08:46 Meet the Bjørn persona 10:16 Meet the Mark persona 11:25 Meet the Annie persona 12:35 The case for reproducibility : we all need it 13:10 How do you spend your 'Data Science' time? 15:46 Use the right tools for the job 16:18 Tool #1 : Revision control (git and github/gitlab/bitbucket) 18:00 Tool #2 : Language (Python 3.6+) 20:13 Tool #3 : Virtual Environments and package managers (conda) 21:35 Tool #4 : Frameworks (Scikit-learn, joblib) 22:32 Tool #5 : IDE (Jupyter notebook) 22:57 Tool #6 : Scripting (Makefiles) 23:15 Tool #7 : Templates (cookiecutter) 24:10 Getting started with the Jupyter notebooks 25:10 Jupyter Notebook naming conventions 27:06 Tip: Tag your code when you do something significant 27:39 Configuring a new cookiecutter-easydata project 29:10 Creating a development environment 28:23 Licences 29:53 Data science is a DAG 31:20 The reproducible flow diagram 34:10 Break 36:00 Q&A : Why Makefiles over Shell scripts? 40:04 Raw Data is Read-Only 43:18 Munging Bjørn's phoneme data 44:44 Tip : Don't hardcode paths. Use pathlib 47:55 Building a 'RawDataset' object 53:36 Building a 'Dataset' object 1:01:05 The workflow module 1:08:33 Where does the 'src' module come from? 1:11:15 The next steps

numfocus / YouTubeVideoTimestamps

Up your Bus Number - A Reproducible Data Science Workflow - Kjell Wooding, Amy Wooding #172