kg2k1b / DAT_SF_16

1 stars 0 forks source link

DAT_SF_16

Course materials for General Assembly's Data Science course in San Francisco (7/29/15 - 10/14/15).

Logistics

Course Description

Foundational course in data science, including machine learning theory, case studies and real-world examples, introduction to various modeling techniques, and other tools to make predictions and decisions about data. Students will gain practical computational experience by running machine learning algorithms and learning how to choose the best and most representative data models to make predictions. Students will be using Python throughout this course.

Required Setup

Completion Requirements

In order to receive a General Assembly Certificate in Data Science, upon completion of the course, students must:

Assignments, milestones and feedback throughout the course are designed to prepare students to deliver a quality course project.

Course Outline

The weekly schedules for lecture content, lab content, and homework assignments are subject to change according to the needs & preferences of the class.

Course Schedule

Week Monday Wednesday
UNIT 1 DATA
1 7/29: Introduction to Data Science, Git setup
2 8/3: Data Format, Access & Transformation + Python review 8/5: Cleaning and exploring data + Linear Algebra review
UNIT 2 MACHINE LEARNING
3 8/10: Introduction to Machine learning, Classification with K-Nearest Neighbors 8/12: Cross Validation and Naïve Bayes
4 8/17: Regression and Regularization 8/19: Logistic Regression
5 8/24: Imbalanced Classes and Evaluation Metrics 8/26: Advanced Classifiers
6 8/31: Ensemble Techniques 9/2: Review of classification and regression
UNIT 3 APPLICATIONS
7 9/7: Labor Day (NO CLASS) 9/9: K-Means Clustering and Unsupervised learning
8 9/14: Dimensionality Reduction 9/16: Recommendation systems
9 9/21: Neural Networks & Deep learning 9/23: Natural Language Processing and Text Mining
10 9/28: Time Series Analysis
UNIT 4 AT SCALE
10 9/30: Database Technologies

<<<<<<< HEAD <<<<<<< HEAD 11 | 10/5: Map Reduce | 10/7: Paralell and distributed computing

11 | 10/5: Map Reduce | 10/7: Paralell and distributed computing

4923e14387266bc4dd47d1b4bc22553274daca5e

11 | 10/5: Map Reduce | 10/7: Data Products 79948e38d9cafa6de45623d2ab86eb3ef20e393d 12 | 10/12: Final project working session | 10/14: Final project presentations

Homework Schedule

HW Topics Dataset Assigned Due Review Due
1 Github setup 7/29 8/3 8/5
2 Data Exploration 8/5 8/10 8/12
3 Classification, KNN + Naïve Bayes Pima Indians 8/12 8/17 8/19
4 Classification, Cross Validation 8/19 8/24 8/26
5 Classification, Evaluation 8/26 8/31 9/2
Midterm ------------- 8/31 9/9 9/11
6 Clustering & Dim Reduction 9/9 9/14 9/16
7 RecSys + NLP 9/16 9/21 9/23
8 Networks 9/23 9/28 9/30
9 Time Series + AWS 9/30 10/5 10/7

Final Project Milestones

FP Deliverable Due
1 Title & Data Sources 8/19
2 Elevator Pitch 9/2
3 Draft Analysis 9/23
8 Final Project Due 10/14

Office Hours

Instructor Times Available method
Justin 3:00 - 6:00 PM Sundays in person (at GA), slack, hangouts by appointment
Francesco Monday & Wednesday slack (quickest response) or hangouts by appointment

Slack

You've all been invited to use Slack for chat during class and the day. Please consider this the primary way to contact other students. Justin will be in Slack during class to handle questions. All instructors will be available on Slack during office hours (listed above).

Resources

Working in the terminal

Statistical Learning Theory

Algorithms

Python