oldoc63 / learningDS

Learning DS with Codecademy and Books
0 stars 0 forks source link

Plotting Bar Charts Using Seaborn #506

Open oldoc63 opened 1 year ago

oldoc63 commented 1 year ago

Now that we know all about bar charts, let’s create our own in Python! To do this, we are going to use a library called seaborn. It creates powerful visuals without any syntax headaches, and it is built off of the Matplotlib library.

oldoc63 commented 1 year ago
  1. Let’s practice plotting our own bar chart. In plot.py, there is a csv file loaded called games.csv. This dataset contains information about various chess games, including:

    Number of turns the game took Ending result (checkmate/draw/resign) Who won (player with white pieces or black pieces) Other variables

Feel free to read more about the dataset here.

Click run to see the first five rows of the file. Which categories would be useful to visualize with a .countplot() method?

oldoc63 commented 1 year ago
  1. Plot the counts of each value in victory_status using the .countplot() method. Be sure to show your plot after using the method.
oldoc63 commented 1 year ago

Bar Chart Ordering

You will often see bar graphs with bars set in a certain order. This can help communicate meaningful features about your data. Whether our category labels are ordinal or nominal will affect how we want to order our bars and what features we can emphasize.

oldoc63 commented 1 year ago

Nominal Data

Nominal data has labels with no specific order. Thus, we have a lot of creative freedom when choosing where each bar goes on our chart. One way would order our data is by ascending or descending order. Let’s apply this ordering to our games.csv data using the .value_counts() pandas method in the order parameter.

oldoc63 commented 1 year ago

From the way we ordered the graphs, it is immediately clear that resign is the most common game outcome and the mode of our victory_status column, while draw is the least common.

In the above example we have value_counts(ascending=True). If we want the bars in reverse order, we can take ascending=True out of the .value_counts() method (descending order is the default). The index call specifies the row labels of the DataFrame.

oldoc63 commented 1 year ago

Ordinal Data

If we are working with ordinal data, we should plot the data according to our categorical variables. For example, let’s say we want to plot the number of students per grade level at a college. We have a table below, which is a preview data from a students.csv file.

We can order the categorical values as First Year, Second Year, Third Year, and Fourth Year since they are ordinal. Using .countplot(), we can input these as a list in the order parameter.

oldoc63 commented 1 year ago
  1. In the plot.py file, we have imported a dataset called school_data.csv and previewed it in the browser. This file contains data about schools in the Chicago, Illinois area.

We are going to look at a column called Supportive Environment which documents how various schools in the area rate in terms of supportiveness with the following values:

NOT ENOUGH DATA
VERY WEAK
WEAK
NEUTRAL
STRONG
VERY STRONG

Before we plot, determine whether this column of data is ordinal or nominal.

Fill in your answer as "nominal" or "ordinal" in the variable type_of_data on line 10.

oldoc63 commented 1 year ago
  1. Graph the Supportive Environment column using the .countplot() method.
oldoc63 commented 1 year ago
  1. Let’s order the labels on the x-axis, putting NOT ENOUGH DATA on the far left and VERY STRONG on the far right. We have conveniently put the order of the labels in a variable called value_order for you.

Add the order parameter into your existing .countplot() method, and use value_order to order the bars accordingly.