Expand Pivot Tables Explanation and Examples

Pivot Tables were really helpful to me when I was exploring the data from both our in-class examples and the dataset in Problem 2 of Problem Set 3. The tables served as a nice supplement to the data visualizations available in Python because:

1) they provided specific numbers (both averages and counts) which allowed me to more easily quantify the differences between conditions of a variable instead of relying solely on graphical visualizations which provided rough approximations but did not allow for exact differences to be found, especially as the scale for the y-axis got larger (i.e. looking at a bar chart of average salary for each gender, the salary for one gender may be approximated to be something around 31,000, while a pivot table would tell you the exact average salary was 31,750). Although the difference isn't great, the table puts the data in quantifiable terms which was helpful to me.

2) more importantly, it was easy to start off with a simple relationship among two variables and then add additional variables (by simply adding another column to the index) to further explore the relationship in parsed out detail (adding rank or years in rank). This was especially helpful for me in conceptualizing Simpson's paradox.

Thus, in addition to initially exploring data through visualizations, I would suggest placing more emphasis on pivot tables as well. This could be done either at the Week 5 lesson, "Introduction to Exploring Data in Python," or in the class example, "Simpson's Paradox Homework Example."

Specifically, in addition to the web page link under the Simpson's Paradox Homework Example (which I found really helpful) and the one example at the bottom of that lesson, I would:

1) Add notes about how a) the order of the Index columns affects how the variables are parsed b) that in addition to mean, a count can be obtained for each category of a variable

2) Show a progression of three examples, with the first example showing a simple two variable relationship, and then for each subsequent example adding an additional column to the index, to show how the tables parse out the columns.

3) Provide an example of combining a data visualization and pivot table in one output (for instance, creating a bar chart on salary by broken up by gender and rank, and adding a pivot table that provides the exact averages for each of those gender-rank groups). Noting that the pivot table always shows up first for some reason, even if the code for the visualization comes first.

4) Potentially provide an example of how dividing two pivot tables that used the count function can produce a third pivot table with a ratio of the differences between the count totals between the two tables (i.e. in the New Haven data class example when we were trying to calculate the ratios between those who scored over and under 70 combined within each racial group and for each test, the ratios could be obtained by first filtering the combined scores into two tables of over 70 and under 70 combined score, then dividing those two pivot tables).

paultopia / gobbledygook

Expand Pivot Tables Explanation and Examples #1