Open oldoc63 opened 1 year ago
Using the movies DataFrame, create a boxplot for production_budget using the boxplot() function from seaborn. Don't forget to display the plot using plt.show() and close the plot using plt.close().
Create a histogram for production_budget using the histplot() function from seaborn.
Both plots show that the distribution of movie budgets is skewed to the right, with some outliers movies having extremely high butgets. This is consistent with the high mean budget value we saw earlier, since the mean is affected by skewness and outliers.
While summary statistics are certainly helpful for exploring and quantifying a feature, we might find it hard to wrap our minds around a bunch of numbers. This is why data visualization is such a powerful element of EDA.
For quantitative variables, boxplots and histograms are two common visualizations. These plots are useful because they simultaneously communicate information about minimum and maximum values, central location, and spread. Histograms can additionally illuminate patterns that can impact an analysis (e.g., skew or multimodality).
Python's seaborn library, built on top of matplotlib, offers the boxplot() and histplot() functions to easily plot data from a pandas DataFrame: