Course materials for General Assembly's Data Science course in San Francisco (7/29/15 - 10/14/15).
Foundational course in data science, including machine learning theory, case studies and real-world examples, introduction to various modeling techniques, and other tools to make predictions and decisions about data. Students will gain practical computational experience by running machine learning algorithms and learning how to choose the best and most representative data models to make predictions. Students will be using Python throughout this course.
In order to receive a General Assembly Certificate in Data Science, upon completion of the course, students must:
Assignments, milestones and feedback throughout the course are designed to prepare students to deliver a quality course project.
The weekly schedules for lecture content, lab content, and homework assignments are subject to change according to the needs & preferences of the class.
Week | Monday | Wednesday |
---|---|---|
UNIT 1 | DATA | |
1 | 7/29: Introduction to Data Science, Git setup | |
2 | 8/3: Data Format, Access & Transformation + Python review | 8/5: Cleaning and exploring data + Linear Algebra review |
UNIT 2 | MACHINE LEARNING | |
3 | 8/10: Introduction to Machine learning, Classification with K-Nearest Neighbors | 8/12: Cross Validation and Naïve Bayes |
4 | 8/17: Regression and Regularization | 8/19: Logistic Regression |
5 | 8/24: Imbalanced Classes and Evaluation Metrics | 8/26: Advanced Classifiers |
6 | 8/31: Ensemble Techniques | 9/2: Review of classification and regression |
UNIT 3 | APPLICATIONS | |
7 | 9/7: Labor Day (NO CLASS) | 9/9: K-Means Clustering and Unsupervised learning |
8 | 9/14: Dimensionality Reduction | 9/16: Recommendation systems |
9 | 9/21: Neural Networks & Deep learning | 9/23: Natural Language Processing and Text Mining |
10 | 9/28: Time Series Analysis | |
UNIT 4 | AT SCALE | |
10 | 9/30: Database Technologies |
11 | 10/5: Map Reduce | 10/7: Paralell and distributed computing
4923e14387266bc4dd47d1b4bc22553274daca5e
11 | 10/5: Map Reduce | 10/7: Data Products 79948e38d9cafa6de45623d2ab86eb3ef20e393d 12 | 10/12: Final project working session | 10/14: Final project presentations
HW | Topics | Dataset | Assigned | Due | Review Due |
---|---|---|---|---|---|
1 | Github setup | 7/29 | 8/3 | 8/5 | |
2 | Data Exploration | 8/5 | 8/10 | 8/12 | |
3 | Classification, KNN + Naïve Bayes | Pima Indians | 8/12 | 8/17 | 8/19 |
4 | Classification, Cross Validation | 8/19 | 8/24 | 8/26 | |
5 | Classification, Evaluation | 8/26 | 8/31 | 9/2 | |
Midterm | ------------- | 8/31 | 9/9 | 9/11 | |
6 | Clustering & Dim Reduction | 9/9 | 9/14 | 9/16 | |
7 | RecSys + NLP | 9/16 | 9/21 | 9/23 | |
8 | Networks | 9/23 | 9/28 | 9/30 | |
9 | Time Series + AWS | 9/30 | 10/5 | 10/7 |
FP | Deliverable | Due |
---|---|---|
1 | Title & Data Sources | 8/19 |
2 | Elevator Pitch | 9/2 |
3 | Draft Analysis | 9/23 |
8 | Final Project Due | 10/14 |
Instructor | Times | Available method |
---|---|---|
Justin | 3:00 - 6:00 PM Sundays | in person (at GA), slack, hangouts by appointment |
Francesco | Monday & Wednesday | slack (quickest response) or hangouts by appointment |
You've all been invited to use Slack for chat during class and the day. Please consider this the primary way to contact other students. Justin will be in Slack during class to handle questions. All instructors will be available on Slack during office hours (listed above).