mtennekes / cols4all

Colors for all (R package)
https://mtennekes.github.io/cols4all/
297 stars 15 forks source link

Weird behaviour with ggplot2 scales #26

Open agila5 opened 7 months ago

agila5 commented 7 months ago

Dear all, I've decided to create this issue since I've noticed a weird interaction with the ggplot2 package.

According to the GUI created by c4a_gui(), the palette brewer.rd_bu is a diverging palette ranging from red to blue.

image

Nevertheless, the following code creates a palette that does not reflect the preview summarised by c4a_gui():

library(ggplot2)
library(cols4all)

toy <- data.frame(
  x = 1:5, 
  y = 1:5, 
  z = 1:5
)

ggplot(toy) + 
  geom_point(aes(x = x, y = y, col = z)) + 
  scale_color_continuous_c4a_div(palette = "brewer.rd_bu")

Everything works as expected if I consider a “sequential” palette

ggplot(toy) + 
  geom_point(aes(x = x, y = y, col = z)) + 
  scale_color_continuous_c4a_seq(palette = "brewer.rd_bu")

Created on 2023-11-30 with reprex v2.0.2

Am I missing something? What's going on here?

mtennekes commented 7 months ago

Thanks for bringing this up @agila5

It is by design, but open for discussion.

Currently, diverging palettes are assumed to represent negative-positive values, with 0 mapped to the neutral middle color. When scale_color_continuous_c4a_div is used, the palette is assumed to a diverging palette, so 0 is mapped to the middle color, the negative values to the left-hand-side of the palette, and the positive to the right-hand-side. This can be changed with the argument mid.

scale_color_continuous_c4a_seq uses the palate as-is, so the minimum is mapped to the first color, and the maximum to the last color.

library(ggplot2)
library(cols4all)

toy <- data.frame(
    x = 1:5, 
    y = 1:5, 
    z = 1:5,
    z2 = c(-10, -3, 0, 1, 3)
)

# by default, the mid of a diverging palette is set to 0
ggplot(toy) + 
    geom_point(aes(x = x, y = y, col = z2)) + 
    scale_color_continuous_c4a_div(palette = "brewer.rd_bu")


# this can be set manually with 'mid'
ggplot(toy) + 
    geom_point(aes(x = x, y = y, col = z)) + 
    scale_color_continuous_c4a_div(palette = "brewer.rd_bu", mid = 3)

Created on 2023-12-02 with reprex v2.0.2

Any feedback on this is welcome. The reasoning behind this choice is that for most use cases in which a diverging scale is used, 0 is considered as the middle value, at least in my experience. Therefore, in the first example, this mapping seems intuitive to me, at least more intuitive than (-10+3)/2=3.5 as the middle value. However, I also see that the behaviour is odd in your 1st example. Perhaps better to set mid to the middle value in case the value range is either all negative or all positive, and set it to 0 only when both positive and negative values are present. What do you think?