Open oldoc63 opened 1 year ago
read_csv()
pandas function to load in spotify_data.csv into a variable called spotify_data
.tempo
variable. This column gives the beats per minute (bpm) of each song in spotify.csv. The other columns in our dataset are:
For now, we are going to ignore these other columns.
Create a variable called song_tempos that contains the tempo
column data.
Let's investigate the helper functions we will use in the following sections. A file called helper_functions.py should be opened in the workspace for you. It contains three functions: choose_statistic()
, population_distribution()
, and sampling_distribution()
.
choose_statistic()
allows us to choose a statistic we want to calculate for our sampling and population distributions. It contains two parameters:
x
: An array of numberssample_stat_text
: A string that tells the function which statistic to calculate on x. It takes on three values: "Mean", "Minimum" or "Variance".population_distribution()
allow us to plot the population distribution of a dataframe with one function call. It takes the following parameter:
sampling_distribution()
allows us to plot a simulated sampling distribution of a statistic. The simulated sampling distribution is created by taking random samples of some size, calculating a particular statistic, and plotting a histogram of those sample statistics. It contains three parameters:
population_data
: the dataframe being sampled fromsamp_size
: the size of each samplestat
: the specific statistic being meassured for each sample -either Mean, Minimum or VarianceTo start off, let's use the population_distribution()
function to graph distribution of song_tempos
.
The population distribution is approximately normal with a little bit of right-skewness.
population_mean
and population_std
to calculate the standard error of the sampling distribution of the sample mean with a sample size of 30. Save this value in a variable called standard_error
.population_mean
and standard_error
in a CDF, calculate the probability that the sample mean of 30 selected songs is less than 140 bpm. Print your result into the output terminal.
You are a DJ trying to make sure you are ready for a big party. You don´t have time to go through all the songs you can work with. Instead, you want to make sure that any sample of 30 songs from your playlist will get the party started.
The dataset we are using for this project can be found here.
A helper_function.py file is loaded along with the script file. This file contains functions that you will use throughout this project.