Open oldoc63 opened 1 year ago
Let’s practice plotting our own bar chart. In plot.py, there is a csv file loaded called games.csv. This dataset contains information about various chess games, including:
Number of turns the game took Ending result (checkmate/draw/resign) Who won (player with white pieces or black pieces) Other variables
Feel free to read more about the dataset here.
Click run to see the first five rows of the file. Which categories would be useful to visualize with a .countplot() method?
You will often see bar graphs with bars set in a certain order. This can help communicate meaningful features about your data. Whether our category labels are ordinal or nominal will affect how we want to order our bars and what features we can emphasize.
Nominal data has labels with no specific order. Thus, we have a lot of creative freedom when choosing where each bar goes on our chart. One way would order our data is by ascending or descending order. Let’s apply this ordering to our games.csv data using the .value_counts() pandas method in the order parameter.
From the way we ordered the graphs, it is immediately clear that resign is the most common game outcome and the mode of our victory_status column, while draw is the least common.
In the above example we have value_counts(ascending=True). If we want the bars in reverse order, we can take ascending=True out of the .value_counts() method (descending order is the default). The index call specifies the row labels of the DataFrame.
If we are working with ordinal data, we should plot the data according to our categorical variables. For example, let’s say we want to plot the number of students per grade level at a college. We have a table below, which is a preview data from a students.csv file.
We can order the categorical values as First Year, Second Year, Third Year, and Fourth Year since they are ordinal. Using .countplot(), we can input these as a list in the order parameter.
We are going to look at a column called Supportive Environment which documents how various schools in the area rate in terms of supportiveness with the following values:
NOT ENOUGH DATA
VERY WEAK
WEAK
NEUTRAL
STRONG
VERY STRONG
Before we plot, determine whether this column of data is ordinal or nominal.
Fill in your answer as "nominal" or "ordinal" in the variable type_of_data on line 10.
Add the order parameter into your existing .countplot() method, and use value_order to order the bars accordingly.
Now that we know all about bar charts, let’s create our own in Python! To do this, we are going to use a library called seaborn. It creates powerful visuals without any syntax headaches, and it is built off of the Matplotlib library.