swcarpentry / r-novice-gapminder

R for Reproducible Scientific Analysis
http://swcarpentry.github.io/r-novice-gapminder/
Other
164 stars 538 forks source link

Suggestion to add a gentler introduction for ggplot2 #542

Closed wallacelab closed 4 years ago

wallacelab commented 5 years ago

The current ggplot2 lesson feels like it starts very abruptly, going from nothing to using ggplot(), aes(), and geom_point() all in a single line of code. That seems like a bit much for a first experience. (I learned ggplot2 ~6 months ago through this very lesson, and I felt very lost until I could go back through a second or third time.)

It seems like a little expansion of the introduction would help with this. The lesson currently has these lines:

ggplot2 is built on the grammar of graphics, the idea that any plot can be expressed from the same set of components: a data set, a coordinate system, and a set of geoms–the visual representation of data points.

The key to understanding ggplot2 is thinking about a figure in layers. This idea may be familiar to you if you have used image editing programs like Photoshop, Illustrator, or Inkscape.

I suggest expanding/altering it into something more like

ggplot2 is built on the grammar of graphics, the idea that any plot can be expressed from the same set of components. These components are a data set, mapping aesthetics, and layers of graphics.

  • Data sets are the data that you, the user, provide.
  • Mapping aesthetics are what connect the data to the graphics. They tell ggplot2 how to use different parts of your data to affect how the graph looks (such as by changing what is plotted on the X or Y axis, or if parts of the plot should have different sizes or colors based on your data).
  • Layers are the actual graphical output from ggplot2. Layers determine what kind(s) of plot are shown (scatterplot, histogram, etc.), the coordinate system used (rectangular, polar, etc.), and other important aspects of the plot. The idea of layers of graphics may be familiar to you if you have used image editing programs like Photoshop, Illustrator, or Inkscape.

When teaching I like diagramming these out, but I don't know if that belongs in the official lesson itself. Same for pointing out these three components in the first code example.

The reasons I suggest replacing "coordinate system" with "mapping aesthetics" are because

jcoliver commented 5 years ago

Great points, @wallacelab . I like the explanations for the different parts of a stripped-down ggplot call. One challenge I faced when learning ggplot syntax (because, let's be honest, the struggle is real) is that reading the descriptions didn't mean much until I had typed in the code (usually incorrectly). Perhaps the explanatory language you are suggesting could come after the first call (or calls) to ggplot?

wallacelab commented 5 years ago

@jcoliver Thanks. I think it makes more sense to present the overview first, then show the learners how each part of the command fits into it. This lesson was my first training in ggplot, and though I can't speak for others, that big chunk of code at the beginning was a turn-off because it looked really complex and uninterpretable. When teaching, I've found it's actually useful to skip the first plot in the lesson and go to where it starts building the graphic up bit by bit, starting from an empty ggplot() command and then adding data, mapping, and finally a geom.

I went ahead and forked the lesson so I can add some edits for consideration.

ed-lau commented 5 years ago

I would suggest removing or replacing the geom_line() points in the current "Layers" section. While I understand the intention of showing that ggplots can be built by layers and that one set of data/aes can be mapped to different geoms, I think many will find the current line plots to be visually overwhelming.

In my opinion they are also not good data exploration plots - you probably wouldn't expect to find a plot like that in many reports or expect the readers to be able to easily interpret them. Second potential point of confusion is that the section immediately after switches back from a lifeExp vs. year plot back to lifeExp vs gdpPercap plots.

I suggest the transformations and statistics section could follow the first section up to Challenge 1. After that we can use the same lifeExp vs. gdpPercap plot to introducing panel faceting. It is probably also a more intuitive introduction to grouping than the color aes.

image

And changing the aes and geom can follow after this, e.g.,:

image

I find this flow to be more natural but would love to hear what you think.

jcoliver commented 4 years ago

Closing in favor of the discussion on #552