tidyverse / ggplot2

An implementation of the Grammar of Graphics in R
https://ggplot2.tidyverse.org
Other
6.5k stars 2.02k forks source link

Consider adding "static" parameter to geom/stat functions #3062

Open thomasp85 opened 5 years ago

thomasp85 commented 5 years ago

The current approach to repeating a layer across panels is to not have the layer data contain the variable needed for the faceting. This is an approach also implemented in gganimate when it comes to having layers be static across the animation. While this works intuitively, I feel it often requires some additional dataprep before the plotting code, and sometimes require that layers that otherwise use the same data need to target separate almost identical datasets.

Would it make sense to add a static or perhaps repeat argument to geom and stat functions to explicitly mark them for being repeated across panels (and frames in the case of gganimate)?

Example API:

library(dplyr)
library(ggplot2)

# Current approach
diamonds_static <- mutate(diamonds, cut = NULL)
ggplot(diamonds, aes(x = color)) + 
  geom_bar(data = diamonds_static, fill = 'grey70') +
  geom_bar() + 
  facet_wrap(~cut)

# New approach
ggplot(diamonds, aes(x = color)) + 
  geom_bar(fill = 'grey70', static = TRUE) +
  geom_bar() + 
  facet_wrap(~cut)

image

The "problem" with the current approach is that it requires changes to the data source if we decide to change the faceting variable — not a huge problem, but still a barrier to experimentation.

If the static name is too close to the idea of animation, then we can figure out another name for it...

clauswilke commented 5 years ago

I think this is a good idea. I very much adhere to the philosophy that we should optimize the API so we don't usually have to provide the dataset more than once. This is in line with @yutannihilation's gghighlight and my ideas for in-layer sampling in the ungeviz package.

I'm just not fully convinced about static. A good name would make sense in the context of facets first. I can't think of anything better, though.

yutannihilation commented 5 years ago

I couldn't agree this more! Currently, gghighlight needs ugly colnames (you cannot always remove the column as it might be mapped to a necessary aes) to prevent the unhighlighted data from being facetted:

https://github.com/yutannihilation/gghighlight/blob/8dc648cedad2b0f6137a4440b9973e3f9d4c24a0/R/gghighlight.R#L275-L291

thomasp85 commented 5 years ago

completely open to another name... too deep in animation at the moment to be able to think of something better though (except for repeat which I don't particularly like)

smouksassi commented 5 years ago

How would we handle the repetition when user wants to repeat along rows only or columns only ? below is an example to calrify why we might want to do this. I think of it as if it was a margin but instead of having it in a separate panel we want it to be in the background so each panel becomes a "highlighted layer"

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(ggplot2)
mtcars_plot  <- mutate(mtcars,
                        cyl = as.factor(cyl),
                        vs = as.factor(vs),
                        gear = as.factor(gear))
mtcars_static <- mutate(mtcars_plot, gear = NULL, vs = NULL)
mtcars_static2 <- mutate(mtcars_plot, gear = NULL)

ggplot(mtcars_plot, aes(x =  cyl )) + 
  geom_bar(data = mtcars_static, fill = 'grey70') +
  geom_bar() + 
  facet_grid(cols= vars(gear), rows = vars(vs), margins=TRUE)


ggplot(mtcars_plot, aes(x =  cyl )) + 
  geom_bar(data = mtcars_static2, fill = 'red') +
  geom_bar() + 
  facet_grid(cols= vars(gear), rows = vars(vs), margins=TRUE)

Created on 2019-01-09 by the reprex package (v0.2.1)

thomasp85 commented 5 years ago

The old approach would still work, but this would solve 99% of the needs in a more elegant way

ptoche commented 5 years ago

common = TRUE?

thomasp85 commented 5 years ago

My brain woke up again... fixed is obviously the right argument name, and already part of the lingua ggplot2a

yutannihilation commented 5 years ago

I don't think the argument have to be as generic as one word, since this is probably for expert-use. What about something more specific like ignore_facet or allow_facet, if we allow users to choose which variables the data is static on?

ggplot(diamonds, aes(x = color)) + 
  geom_bar(fill = 'grey70', ignore_facet = TRUE) +
  geom_bar() + 
  facet_wrap(~cut)

ggplot(mtcars_plot, aes(x =  cyl )) + 
  geom_bar(fill = 'red', ignore_facet = vars(gear)) +
  geom_bar() + 
  facet_grid(cols= vars(gear), rows = vars(vs), margins=TRUE)

If this doesn't make sense, I'll vote for @thomasp85's fixed.

thomasp85 commented 5 years ago

I would really like something that doesn’t mention facet as I want to use it in gganimate as well

yutannihilation commented 5 years ago

Ah, sorry, my understanding was wrong about the context and how it's going to be implemented. Is this about whether to respect PANEL or not around here? (I thought this is about the implementation of Facet) If so, I agree with fixed.

https://github.com/tidyverse/ggplot2/blob/9eae13b3d17bde26cf9df649887b4a6bb2ac92ce/R/layer.r#L244-L248

thomasp85 commented 5 years ago

I haven't done any POC yet, but in general this should be a way for layers to broadcast that they are part of the "background", and not to be split up. It will certainly require some changes to the different facet implementations...

I just had the bright idea of allowing this argument to be either a logical or character vector defining what type of fixed the layer it should be (it will be up to the different facet/gganimate/extensions) to define how to interpret the strings, but e.g. something like fixed = c('rows', 'frames') would repeat the data on the rows (as above) and during animation...

smouksassi commented 5 years ago

isn't rows/cols only meaningful for facet_grid as in the facet_wrap above the "row" is two rows. User will still have to make sure he specifies the order of layers right and maybe with the new custom aesthetics user might want to have specific separate scales for these "background"/"reference" visual elements so we can have a kind of a legend defining what they are e.g. above we would have a fill legend with gray area for background data and names as such then we might want another legend with fill to identify the regular fill mapping.

yutannihilation commented 5 years ago

Thanks for clarification, it makes sense.

e.g. something like fixed = c('rows', 'frames') would repeat the data on the rows (as above) and during animation...

I feel row and frame are too specific then.

yutannihilation commented 5 years ago

The user might want to control whether the data is split over

This is pretty complicated if we discuss all of them at once...

thomasp85 commented 5 years ago

I don't think the name of the variable should be mixed into this... that is part of the facet spec.

I'm envisioning multiple valid strings to be used

The test in e.g. facet_grid for whether to fix the data across rows would be:

if (isTRUE(params$fixed) || params$fixed %in% c('facet', 'facet_row')) {
  # do whatever needed to fix the data
}
yutannihilation commented 5 years ago

that is part of the facet spec.

Ah, that's convincing. Thanks!

params$fixed %in% c('facet', 'facet_row')

I'm curious about this part. So, though frame is suggested in doc, ggplot2 doesn't need to know how and by who that keyword will be used, right? If so, I agree with your idea.

thomasp85 commented 5 years ago

I don’t think frame should be mentioned in the ggplot2 docs. I just included it to show a non-facet use

yutannihilation commented 5 years ago

OK, then it looks OK to me, I just worried that ggplot2 would have to know about the implementation of gganimate.

thomasp85 commented 5 years ago

Ah, no. There need to be a strict separation IMO

teunbrand commented 5 years ago

I'm sorry for commenting on an issue last discussed here in january, but wouldn't a simple data subsetting wrapper produce the desired behaviour? Something along the lines of:

ggsubset <- function(rowtest = NULL, omit = NULL) {
  rowtest <- substitute(rowtest)
  if (is.null(rowtest)) {
    rowtest <- substitute(TRUE)
  }
  omit <- substitute(omit)
  if (is.null(omit)) {
    omit <- substitute(TRUE)
  }
  function(x) subset.data.frame(x, eval(rowtest), -eval(omit))
}

Wherein rowtest is a logical expression of which rows to keep (e.g. Species == "setosa" in the iris dataset) and omit is a column name you want to exclude for facetting purposes. It wouldn't store a complete data.frame in the plot$layer[[...]]$data slot, so it is more memory efficient than copying an extra diamonds_static <- mutate(diamonds, cut = NULL).

This would handle the diamond case as follows:

ggplot(diamonds, aes(x = color)) + 
  geom_bar(data = ggsubset(omit = cut), fill = 'grey70') +
  geom_bar() +
  facet_wrap(~cut)

image

Or the iris dataset:

ggplot(iris, aes(Sepal.Width, Sepal.Length)) +
  geom_point(data = ggsubset(omit = Species), colour = "grey70") +
  geom_point(aes(colour = Species)) +
  facet_wrap(~Species)

image