matthewcarbone / Bootcamp

A collection of tutorials and resources for data science and machine learning
BSD 3-Clause "New" or "Revised" License
4 stars 4 forks source link

Introduction to NumPy, tabular data and visualization #2

Open matthewcarbone opened 4 months ago

matthewcarbone commented 4 months ago

Introduction to NumPy, tabular data and visualization

This is the second day's content which will focus on introducing the core numerical scientific computing library (NumPy), upon which all ML in Python is built. In addition, there will be a focus on data visualization using Matplotlib.

Learning objective

Students will finish this module with an understanding of NumPy and Pandas, the premier numerical scientific computing and tabular data analysis libraries in Python. Specifically, students will understand how to manipulate (which will require a rudimentary understanding/review of algebra) and visualize data in array form.

Content to cover

A note on the difference between the standard library and other libraries

Introduction to NumPy

Basic operations in NumPy

Advanced operations in NumPy

Introduction to Pandas

Creating Pandas objects

Data manipulation

Introduction to data analysis

Introduction to Matplotlib

Advanced plotting

Capstone

Students will pretend to be data scientists at a company, tasked with presenting an analysis of some dataset to management. Students should go on Hugging Face, Kaggle, or some other open-access online database platform, download and analyze some dataset. Emphasis should be placed on visualizing the data (remember, management doesn't have time to read a bunch of text or tabular data, they want to see informative figures!). For example, The Spotify Tracks Dataset on Hugging Face is a good place to start.