oldoc63 / learningDS

Learning DS with Codecademy and Books
0 stars 0 forks source link

Visualizing Quantitative Variable #395

Open oldoc63 opened 1 year ago

oldoc63 commented 1 year ago

While summary statistics are certainly helpful for exploring and quantifying a feature, we might find it hard to wrap our minds around a bunch of numbers. This is why data visualization is such a powerful element of EDA.

For quantitative variables, boxplots and histograms are two common visualizations. These plots are useful because they simultaneously communicate information about minimum and maximum values, central location, and spread. Histograms can additionally illuminate patterns that can impact an analysis (e.g., skew or multimodality).

Python's seaborn library, built on top of matplotlib, offers the boxplot() and histplot() functions to easily plot data from a pandas DataFrame:

oldoc63 commented 1 year ago

Using the movies DataFrame, create a boxplot for production_budget using the boxplot() function from seaborn. Don't forget to display the plot using plt.show() and close the plot using plt.close().

oldoc63 commented 1 year ago

Create a histogram for production_budget using the histplot() function from seaborn.

oldoc63 commented 1 year ago

From the plots, what do you notice about the distribution of movie budgets?

Both plots show that the distribution of movie budgets is skewed to the right, with some outliers movies having extremely high butgets. This is consistent with the high mean budget value we saw earlier, since the mean is affected by skewness and outliers.