Open oldoc63 opened 1 year ago
As seen, we have two quantitative variables (rent and size_sqft) and one categorical variable (borough). The pandas library offers a handy method .describe() for displaying some of the most common summary statistics for the columns in a DataFrame. By default, the result only includes numeric columns, but we can specify include='all' to the method to display categorical ones as well:
This is a great way to get an overview of all the variables in a dataset. Notice how different statistics are displayed depending on the variable type.
In script.py, we've imported a dataset containing information on the budget and earnings of movies from various genres into a DataFrame called movies.
Start by inspecting the first 5 rows of movies using the .head() method and print the result.
How many quantitative and categorical variables do you see?
Use the .describe() method to display the summary statistics for movies and print the result. Make sure to show statistics for all columns in the DataFrame.
What kind of metrics are displayed for quantitative columns versus categorical columns?
Introduction
Before diving into formal analysis with a dataset, it is often helpful to perform some initial investigations of the data through exploratory data analysis (EDA) to get a better sense of what you will be working with. Basic summary statistics and visualizations are important components of EDA as they allow us to condense a large amount of information into a small set of numbers or graphics that can be easily interpreted.
This lesson focuses on univariate summaries, where we explore each variable separately. This is useful for answering questions about each individual feature. Variables can typically be classified as quantitative (ie, numeric) or categorical (ie, discrete). Depending on its type, we may want to chose different summary metrics and visuals to use.
Let's say we have the following dataset on New York City rental listings imported into a pandas DataFrame (subsetted from the StreetEasy dataset):