tidyverse / ggplot2

An implementation of the Grammar of Graphics in R
https://ggplot2.tidyverse.org
Other
6.52k stars 2.02k forks source link

Register default scales via themes #2691

Open clauswilke opened 6 years ago

clauswilke commented 6 years ago

As previously mentioned in the discussion of #2239: It would be nice if default scales could be set via themes.

Currently, default scales are defined as functions that are named scale_aesthetic_data-type(), where aesthetic is the name of the aesthetic and data-type represents the data type, i.e., continuous, discrete, date, etc. For example, scale_colour_discrete() is the default scale function for discrete colors.

The definition of these defaults is distributed all over the scales code, because for some scales the default scale is the only scale that exists (example: https://github.com/tidyverse/ggplot2/blob/77fc5c705efd19c96f6b69f078a0aeacc87c933d/R/scale-alpha.r#L31) whereas for others there are multiple possible ones and hence one is set as the default (example: https://github.com/tidyverse/ggplot2/blob/77fc5c705efd19c96f6b69f078a0aeacc87c933d/R/zxx.r#L6).

It would be nice if there were a way to change the defaults other than to change the global function definitions, and in particular if the defaults could be linked to the theme. For example, the theme could have an entry default_scales = list("scale_color_discrete" = scale_color_hue, "scale_color_continuous" = scale_color_gradient, ...) that lists the default scales to be used with this theme.

In terms of implementation, I see two problems, and it's not clear to me how much of a hurdle they might be. First, the code that searches for the default scales currently doesn't have access to the theme: https://github.com/tidyverse/ggplot2/blob/bcc3ae59f7f5257fdfce5314a71ae255e4a3f12a/R/scale-type.R#L1-L20 It is called from this function: https://github.com/tidyverse/ggplot2/blob/bcc3ae59f7f5257fdfce5314a71ae255e4a3f12a/R/scales-.r#L91 which is called from here: https://github.com/tidyverse/ggplot2/blob/a7b31356ef1de985fc23e3fee14ef5d2afd20c85/R/layer.r#L214 and here: https://github.com/tidyverse/ggplot2/blob/a7b31356ef1de985fc23e3fee14ef5d2afd20c85/R/layer.r#L274

Second, it's not immediately clear to me whether we even have the complete theme by the time these functions are called. They are ultimately called in plot_build() here: https://github.com/tidyverse/ggplot2/blob/3f63d6235d96da47c2c801d3c72afa2f2d5a0e85/R/plot-build.r#L48 If by that time the entire theme is assembled, then I assume it could be handed down through all these functions to the point where the scale is looked up. Otherwise, this proposal would be infeasible.

clauswilke commented 6 years ago

This function would also have to be modified: https://github.com/tidyverse/ggplot2/blob/bcc3ae59f7f5257fdfce5314a71ae255e4a3f12a/R/scales-.r#L110-L121

It is similarly called from plot_build(), so this shouldn't create any new complications.

baptiste commented 6 years ago

Just for completeness I'll re-iterate my personal preference for a revised scale_colour|fill_xx interface. Currently we have this situation,

apropos("^scale_color_")[-c(5,7,14,22,23)]
 [1] "scale_color_brewer"             "scale_color_calc"              
 [3] "scale_color_canva"              "scale_color_colorblind"        
 [5] "scale_color_continuous_tableau" "scale_color_distiller"         
 [7] "scale_color_economist"          "scale_color_excel"             
 [9] "scale_color_few"                "scale_color_fivethirtyeight"   
[11] "scale_color_gdocs"              "scale_color_gradient_tableau"  
[13] "scale_color_gradient2"          "scale_color_gradient2_tableau" 
[15] "scale_color_gradientn"          "scale_color_grey"              
[17] "scale_color_hc"                 "scale_color_hue"               
[19] "scale_color_pander"             "scale_color_ptol"              
[21] "scale_color_solarized"          "scale_color_stata"             
[23] "scale_color_tableau"            "scale_color_viridis_c"         
[25] "scale_color_viridis_d"          "scale_color_wsj"

where besides the distinction discrete/continuous I would argue that the essential difference is the palette. An alternative description might be a generic scale_colour_discrete/continuous(palette = "grey"), and in this form I would rather agree that "grey" could nicely fit in the theme description.

The interface might require a new kind of object to cater for all cases – a string to refer to built-in palettes, a function (potentially a functor) e.g for user-defined gradient/gradient2/gradientn and their parameters (midpoint, NA, etc.).

To me this feels more natural, and more in line with other design choices such as scale_x_continuous(trans = "sqrt").

The current array of special-case functions could still coexist, similarly to scale_x_log10.

hadley commented 4 years ago

I would love to see an implementation of this to see how feasible it is.

clauswilke commented 4 years ago

@cpsievert Would you be interested in trying to take this on? I don't have the capacity right now to tackle this.

cpsievert commented 4 years ago

@clauswilke I have other priorities at the moment, too. I don't think this should necessary block #3833, though. If there's some reason why it should, let me know, and I'll try and find time to look into this.

clauswilke commented 4 years ago

Agreed. Just think about whether your proposed API is right if we assume at some point scales can be set from the theme. This may or may not be the case. I haven't thought it through carefully.

alanocallaghan commented 4 years ago

Second, it's not immediately clear to me whether we even have the complete theme by the time these functions are called.

The theme is by default an empty list when compute_aesthetics is called. If a theme has been added to the plot it's not empty but that's clearly not the case being targeted.

I've not checked in detail but it seems the complete theme is built by https://github.com/tidyverse/ggplot2/blob/660aad2db2b3495ae0d8040915a40d247133ffc0/R/theme.r#L444 when it's called in ggplot_gtable.ggplot_built https://github.com/tidyverse/ggplot2/blob/660aad2db2b3495ae0d8040915a40d247133ffc0/R/plot-build.r#L166

I would assume you could build the theme during ggplot_build instead. Moving it breaks 2 theme tests (these ones: https://github.com/tidyverse/ggplot2/blob/660aad2db2b3495ae0d8040915a40d247133ffc0/tests/testthat/test-theme.r#L219-L227 ), but these are just checking that the built plot doesn't have a complete theme. That doesn't seem like a major issue, since the theme is completed when being drawn in any case. Of course it's quite probable I'm overlooking something (the tests are presumably there for a reason...).

alanocallaghan commented 4 years ago

Quick implementation here https://github.com/Alanocallaghan/ggplot2/tree/2691-default-scales-via-themes

As I mentioned in previous comment, I moved plot_theme to be during ggplot_build rather than after; otherwise I just passed the theme through down to find_global.

For the purposes of a visual example I modified the theme_grey default scale_colour_discrete to be scale_color_brewer with Set1 palette.

reprex:

devtools::load_all("~/Documents/github/ggplot2")
ggplot(mtcars, aes(mpg, cyl, color=factor(cyl))) + geom_point() + theme_grey()

I note that there's an obviously silly line here which is not optimal but it works as a stopgap https://github.com/Alanocallaghan/ggplot2/blob/2691-default-scales-via-themes/R/scale-type.R#L28

clauswilke commented 4 years ago

Thanks for looking into this. I've got a few requests that will help us move forward.

  1. Please turn your code into a PR so we see what changes you made and what the results of the automatic tests are. You can start the title of the PR with "WIP:" to indicate work in progress.

  2. Please demonstrate the functionality with a self-contained reprex. I.e., no load_all() and no secret code modifications of the default theme. Show how the code would be used in practice, e.g. by defining a new theme and then using it.

  3. What happens when a theme defines some scales and now I want to use that them and also define some other scales with theme()? Is this possible? How does it work?

  4. Does the following reprex still work? (I.e., the currently set theme is used at plotting time, not the theme that was set when the plot was constructed.)

library(ggplot2)

p <- ggplot(mtcars, aes(disp, mpg)) + geom_point()
p


theme_set(theme_bw())
p

Created on 2020-04-29 by the reprex package (v0.3.0)

clauswilke commented 4 years ago

Oh, and for the line that you consider "silly" please add a comment to your code indicating what is happening and why you think it is silly. This will prevent us from accidentally forgetting that there may have been a problem that needed addressing.

alanocallaghan commented 4 years ago

Sure. Can you clarify point 3? I don't follow.

clauswilke commented 4 years ago

Can you clarify point 3?

Let's say theme_A() defines a color scale and a fill scale. I'd like to use theme_A() but use my own color scale. How do I do that?

ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) +
  geom_point() +
  theme_A() +
  theme(
    default.scales = ... # what goes here?
  )

Or, I like the color and fill scales, but I also want to define a shape scale. How do I do that?

ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species, shape = Species)) +
  geom_point() +
  theme_A() +
  theme(
    default.scales = ... # what goes here?
  )
alanocallaghan commented 4 years ago

Partial specification of the default scales runs into an undefined merge_element method

library("ggplot2")
library("viridis")

theme_A <- function() {
  theme(
    default.scales = c(
      "scale_colour_continuous" = scale_colour_viridis
    ),
    complete = TRUE
  )
}
ggplot(mtcars, aes(disp, mpg, color=cyl, fill = factor(cyl))) +
  geom_boxplot() +
  theme_A() +
  theme(
    default.scales = c(
      scale_fill_discrete = scale_fill_brewer
    )
  )
#> Error: No method for merging list into list

Trace is:

<error/rlang_error>
No method for merging list into list
Backtrace:
    █
 1. └─ggplot2:::`+.gg`(...)
 2.   └─ggplot2:::add_ggplot(e1, e2, e2name)
 3.     ├─ggplot2::ggplot_add(object, p, objectname)
 4.     └─ggplot2:::ggplot_add.theme(object, p, objectname)
 5.       └─ggplot2:::add_theme(plot$theme, object)
 6.         ├─ggplot2::merge_element(t2[[item]], t1[[item]])
 7.         └─ggplot2:::merge_element.default(t2[[item]], t1[[item]])

Implementing a very basic merge_element.list fixes the behaviour for all of your points Claus, as far as I can tell at least. It might be better to define a class like element_list or similar to hold the scales, because I imagine lists turning up as theme elements would be a bug more often than intended behaviour.

library("ggplot2")
library("viridis")
#> Loading required package: viridisLite

## Without changing the default scale
ggplot(mtcars, aes(mpg, cyl, color=factor(cyl))) + geom_point()


theme_A <- function() {
  theme(
    default.scales = c(
      "scale_colour_continuous" = scale_colour_viridis
    ),
    complete = TRUE
  )
}

## With a theme that sets one scale (colour) and not another (fill)
ggplot(mtcars, aes(disp, mpg, fill = factor(cyl))) +
  geom_boxplot() +
  geom_jitter(aes(color=mpg)) +
  theme_A()


## use the same theme that sets colour and also set a default for fill
ggplot(mtcars, aes(disp, mpg, fill = factor(cyl))) +
  geom_boxplot() +
  geom_jitter(aes(color=mpg)) +
  theme_A() +
  theme(
    default.scales = c(
      scale_fill_discrete = scale_fill_brewer
    )
  )

p <- ggplot(mtcars, aes(disp, mpg, color=mpg)) + geom_point()
# printing it before setting a theme means it has the default scales
p

# but if we set theme it has the updated scales
theme_set(theme_A())
p

clauswilke commented 4 years ago

Not sure I like the interface where we put all the scales into one element, like so:

theme(
  default.scales = element_scales(
    scale_fill_discrete = scale_fill_viridis_d,
    scale_fill_continuous = scale_fill_viridis_c
  )
)

It seems cumbersome. What about adding elements for the different types of scales, like so:

theme(
  scales.discrete = element_scales(
    fill = scale_fill_viridis_d
  ),
  scales.continuous = element_scales(
    fill = scale_fill_viridis_c
  )
)

@hadley @yutannihilation @thomasp85 Any thoughts on a possible interface?

baptiste commented 4 years ago

I vote for

theme_2020(scale_y = element_scale(trans = 'log'))

More seriously though, what do people think of my remarks above regarding the explosion of names for fill and colour scales? The gist of it is,

  1. other scales have a fairly standard interface, e.g scale_x_sqrt() got superseded quite a while ago by scale_x_continuous(trans = sqrt_trans)
  2. The palette, be it from Brewer, viridis, studio Ghibli, etc. could be but a parameter in a standard family, i.e. scale_colour_continuous(palette = 'viridis_c')

The reason I bring this up is that in my view specifying a palette in a theme seems much more straightforward than a scale, especially if they are all a bit different.

I do realise that not all colour scales work with just a palette, there are also gradient, gradientn, etc. but I suspect a common interface (or rather, a handful, like viridis_c vs viridis_d) abstracting away the specific colours, would make for a more robust and tidy interface.

Specifying a palette is also something that would integrate more easily with other workflows, e.g external CSS values, json specs, etc.

clauswilke commented 4 years ago

@baptiste I think your point is orthogonal to this issue. Regardless of whether we introduce scale functions that allow us to pick scales by palette names, we still have to set appropriate scale functions. And keep in mind that in addition to colors, there are also scales to set shape, size, alpha, linetype, etc. I think an interface that allows us to pick an arbitrary function is ultimately needed, even if there are simplified approaches for special cases.

For the colorspace package, I attempted to implement generic scale functions that take a palette argument, and it's still quite complicated: http://colorspace.r-forge.r-project.org/articles/ggplot2_color_scales.html

And, if you want to define your own palettes, you'll quickly run into the situation where you have to specify at least 9 numeric parameters, and still you're limited in the type of color gradients you can generate: http://colorspace.r-forge.r-project.org/articles/hcl_palettes.html

There's also the paletteer package that attempts to provide a unified interface to many different color scales: https://cran.r-project.org/web/packages/paletteer/index.html

clauswilke commented 4 years ago

@baptiste Also, my proposed interface could be generalized easily to handle both scale functions and palette names, by simply accepting character strings that are interpreted as palettes:

theme(
  scales.discrete = element_scales(
    fill = "viridis" # creates discrete color scale using "viridis" palette
  ),
  scales.continuous = element_scales(
    fill = "viridis" # creates continuous color scale using "viridis" palette
  )
)
baptiste commented 4 years ago

I don't think it has to be orthogonal – my point, in relation to this specific thread, is that it'd be more convenient to define some information about the scales in the theme, but not necessarily the scale object itself. In its bare-bones implementation it would mean passing just a vector of colours, but obviously that's not enough for all types of colour scales. A list of parameters, however, e.g element_scale(palette = 'viridis_c') could be a flexible way to pass theming parameters, and let the scale object build itself from that information. In other words, I don't see why a theme would need to hold a complicated scale object where a list of parameters could do.

Does that make more sense?

baptiste commented 4 years ago

(to push the analogy, the geom-defaults-in-theme branch doesn't define geom_point in the theme; it defines its default appearance)

alanocallaghan commented 4 years ago

Not sure I like the interface where we put all the scales into one element, like so: [...] It seems cumbersome. What about adding elements for the different types of scales, like so: [...]

I agree fwiw. Arguably it's not a huge problem because presumably default scales will be set once. With the latter more explicit approach it's probably easier to validate what's being set. For example as baptiste pointed out you may not want people setting a default for scale_y...

In other words, I don't see why a theme would need to hold a complicated scale object where a list of parameters could do.

One problem I could see with this approach is that you may have to essentially duplicate the scale class to hold sufficient information to adequately configure default scales.