Draft thematic overview of roughly which topics to cover and how they are connected

joelostblom / viz-oer

An interactive open educational resource for learning data visualization

0 stars 0 forks source link

Here are the main resources I am aware of on the top of my head. I made an initial grouping of how I think of these in my head, but by all means feel free to create your own labels/groups that you think makes more sense. I have elaborated quite a bit and tried to also identify what I like in particular in some of these resources, just to highlight for you where my mind is at. After writing these down my sense is that we will follow some combination of strategies to allow for things like natural progression, engaging narratives, and structured material to co-exist; exciting =)

Coding tutorials. These rarely discuss best practices for visualization but teach students how to build charts using a specific library. It is of course common in the docs for the specific libraries (which is where I think it makes the most sense), but I also see it in some online text books:
- https://jjallaire.github.io/visualization-curriculum/
- https://ggplot2-book.org/
- https://socviz.co/lookatdata.html#lookatdata (it’s a bit unfair to put this here maybe; I the narrative in the intro of this book exciting (1-look at data), but then it becomes a bit of a coding tutorial but still with engaging sprinkles).
By chart type, I find these a bit lacking because it doesn’t fit how we think about the data. We don’t (hopefully) think “I want to make a bar chart, how can I do that with my data”? Some might because they associate a type of chart strongly with a type of data, e.g. a bar chart with error bars for averages, but a bar charts can be used for so many other purposes and there are many other (better) ways of showing what a bar chart with error bars shows
- Luckily, I don’t know that many textbooks that take this approach, but it is common on overview sites, e.g. this one https://datavizcatalogue.com/index.html
By data type (e.g. amounts, proportions, time-series, ranges, etc). I find it effective when the data is central so I quite like this approach. This flips the question above to a more helpful order: “I want to visualize amounts, which charts/strategies can I use to most effectively do that?”
- https://clauswilke.com/dataviz/ (for “Part 1”. There are some topics in the other parts that don’t fit the Part 1 approach and that’s fine).
- Something I do miss in this textbook is an opionated conceptual guide on how to do EDA (and maybe other topics). That’s an important topic I want to include, even if it doesn’t quite fit this “by data type” format (a potential useful guide for the future on EDA https://r4ds.hadley.nz/eda). I also think the narrative in this book could be stronger, it is not as engaging as the intro for the socviz book I mentioned above).
By variable type/xy-relationship. Maybe this is a sub-type of the point above. Here I am referring to when the content is structured more based on what type of data is in each variable, e.g. there might be a section on “categorical vs quantitative”, “ordinal vs categorical”, and so on. I don’t know if I have seen this in a full textbook, but the old seaborn documentation used to be structured like this and I learned a lot from it at the time.
- https://seaborn.pydata.org/tutorial.html (not as much anymore)
- This one as well https://www.data-to-viz.com/
I’m unsure where these fit:
- The very useful FT visual vocabulary posted previously https://ft.com/vocabulary. This seems to me as a mix of data type (spatial, time-series, magnitude) and a higher level where the author has made a decision of what to show (ranking, correlation, distribution, flow, part-to-whole).
- 531 https://pages.github.ubc.ca/mds-2023-24/DSCI_531_viz-1_students. Maybe I would call this “chronological”, “sequential”, or something along those lines? I remember when I created the course ~4 years ago, I thought of it mostly bottom-up: what is the first thing we need to know about data visualization to get started - why it is important to visualize data. How can students get started - by learning how to visualize one point. And so on. That’s why we do visualization for ourselves (EDA) before learning how to do it for others (communication).
- I like the narrative and progression; my sense is that it is easy to follow the path here to learn both about concepts and how to implement them with code. However, I have wished for there to be more structure, it is sometimes unclear why a certain chart type is introduced in a certain chapter as I essentially went with what I thought was easiest to grasp and most familiar first. I am not sure to what extent the progression aspect can be combined with a structure such as “by data type”, but it would be great to have both.
- I also like that we talk about encoding channel efficiency in 531 (and many CS text books). I don’t know where this topic fits, but for me it is a seems
Graphics focused. I think there are a bunch of these from either a CS algorithmic angle or a graphical design angle. I’m not too interested in this for the entire book, but I see a compressed version of the graphical design aspect as part of “communication” which is important that students learn. In terms of algorithms, I don’t think force layouts etc will make it into the book, but I have wished to make heatmaps clustering/seriation part of the EDA chapter for a few years now as I think it is powerful in that phase.
- Tamara muntzer at UBC has a CS book which was previously used in 531 https://www.taylorfrancis.com/books/mono/10.1201/b17511/visualization-analysis-design-tamara-munzner. I skimmed her slides and didn't reuse that much as I think it is more in depth and less about the data.

Storytelling with Data by Cole Nussbaumer Knaflic teaches by chart type, but also has a strong section on reducing visual clutter. Here are some features of this book: Use of bad examples. Starts with an obviously terrible visualization. Reminds me of the examples of bad visualizations you showed in 531- these are quite memorable. Sequential improvement of data visualizations. Each iteration illustrates the impact of a change. Instructive tone: it says things like “Here are 10 steps you can take to reduce visual clutter” rather than “A variety of types of visual clutter should be avoided.” This feels more actionable to me but there is a clear trade-off between this approach and being too prescriptive. Classification: Chart type, Sequential

Good Charts by Scott Berinato This one surprised me. It’s almost written stream of consciousness and reads like an amusing self help guide. The fascinating part is that it’s really focussed on how to think about data visualization. It’s very conceptual. It uses a series of case-studies and employs that sequential improvement trick throughout. It has a cool section on deceptive charts. I find the storytelling narrative style quite captivating. Classification: Sequential, Narrative-driven, Use of invented conceptual frameworks

Data Visualization in Society by Martin Engebretsen and Helen Kennedy It’s essentially a collection of case-studies written by academics and cobbled together as chapters. The approach is highly conceptual and does not aim to teach the practicalities of data visualization, but rather the scientific theories surrounding it. Chapter 6 was quite interesting as it discusses exactly how visualizations can be used deceptively. Overall, this book seems to written in a style we do not want to emulate. Classification: Academic, case-studies, theoretical

Data Sketches A Journey of Imagination, Exploration, and Beautiful Data Visualizations (Nadieh Bremer Shirley Wu) This one is very design focussed but also contains a lot of code. The emphasis is on making elaborate custom visualizations in D3. Instead of teaching data visualization by chart type or structuring things by concept, the book is a collection of examples of work, where they deconstruct exactly how it was made and why they made those decisions. I think there is something cool about demystifying the process – but this book does not challenge the reader or encourage them to participate in some way. There are also no overarching themes or commentary that tie everything together. Classification: Graphics focussed

Data-Driven Storytelling (AK Peters Visualization Series) This book is structured as a series of case studies. Again, it may be more useful for the content rather than structure. It has an excellent section on Exploration vs Explanation in data visualization. This includes many of the visualizations you showed in 531, such as Iraq’s Bloody Toll and Napolean’s Map. Part of this makes me wonder is a useful structure for a book is different chapter based on the purpose (exploration vs explanation and further subcategories). The other book Good Charts defines this as two axis: Exploratory to Declarative and Conceptual to Data-Driven. This may be too simplistic but I could imagine an entire book structured by subcategories of what you actually aim to achieve with the visualization.

How Charts Lie by Alberto Cairo This book effectively employs the sequential improvement narrative. It starts with misleading charts and then improves them. Its chapters are structured by the different types of misleading charts (poor design, dubious data, insufficient data, confusing uncertainty, misleading patterns). I think that there is a bit of a lack of real structure in this book though. It seems to be through example after example like “here’s another one.” Again, it's hard to remember the lessons it is teaching as they are structured as a continuous monologue rather than a clear framework. Classification: Sequential improvement, personal narrative

Show Me the Numbers (Stephen Few) This one is quite technical. It generally follows the chart type paradigm but has sections divided by the specific features of a chart e.g, axis. It seems a bit reductionist. One thing it does well is explicitly challenge the reader to solve problems. But it makes me wonder: do readers actually pause and try and work things out themselves? Are there ways to create slightly more resistance so the reader doesn’t just skip to the next paragraph to get the solution? Classification: By Chart, by chart feature, problem solving

Thoughtful Data Visualization

Imagine you've been tasked with creating a visualization from a complex dataset to inform major policy decisions. As you begin, you encounter questions that challenge your approach:

How do I balance providing full context and nuances with presenting a clear, focused message?

Should I use familiar chart types that are quickly understood, or innovative formats that might engage viewers more deeply?

How can I create a visualization that's both widely accessible and leverages the power of interactivity?

These questions highlight a fundamental truth about data visualization: it's a field defined by tensions. Every decision we make as we craft a visualization involves carefully balancing competing needs and priorities.

This course is designed to explore these tensions and equip students with the critical thinking skills needed to navigate them effectively. We'll focus on three core tensions that underlie virtually every visualization decision:

Context vs Focus

How do we provide comprehensive context and nuances while delivering a clear, focused message? When might we prioritize one over the other?

Innovation vs Convention

How do we balance using familiar visual elements with introducing new ones? What impacts might even small changes have on data interpretation? How can we ensure accuracy when deviating from standard practices?

Accessibility vs Interactivity

How can we create visualizations that are both widely accessible and leverage the power of interactivity?

Course Structure

Part 1: Context vs Focus

Finding Focus: Reducing Clutter and Creating Hierarchy

Introduction to cognitive load in data visualization
Techniques for reducing visual clutter
Overplotting
Visual hierarchy and selective highlighting
531 Overlap: Lecture 2, 5, 6

Crafting a Focused Narrative

Introduction to explanatory data visualization
How to send a clear message with your visualization
Using chart titles, labels, and axis to support your narrative
How to steer your audience responsibly
531 Overlap: Labs

Keeping Context in the Picture

The dangers of oversimplification
Visualizing distributions (box plots, histograms, violin plots)
Common pitfalls in simplifying data
531 Overlap: Lecture 3

Embracing Context: Exploratory Data Visualization

531 Overlap: Lecture 4

Part 2: Innovation vs Convention

Why Conventions Matter

The psychology of data and shape perception
Common perceptual biases
The key rules of data visualization
Avoiding unintentional distortions of data
531 Overlap: Lecture 2

Breaking the Right Rules

What can you do and what is off limits?
Principles of effective axis design – which changes are acceptable and which aren’t
How to choose trendlines responsibly
Confidence intervals
531 Overlap: Lecture 5, Lecture 7

Responsible Innovation

What are the levers you can pull to create engaging visualizations without sacrificing accuracy?
Examples of inspiring data visualizations

Part 3: Accessibility vs Interactivity

Accessibility by Design

What is accessibility in data visualization?
Color theory and how to choose accessible palettes
Resources to make accessibility easier
531 Overlap: Lecture 6

The Promise and Pitfalls of Interactivity

The purpose of interactive visualizations
How to design interactive visualizations without sacrificing accessibility
531 Overlap: Lecture 8

joelostblom / viz-oer