tidyverse / ggplot2

An implementation of the Grammar of Graphics in R
https://ggplot2.tidyverse.org
Other
6.39k stars 2k forks source link

`scale_*_*` `labels` argument often doesn't work as expected with a function #5881

Closed davidhodge931 closed 2 months ago

davidhodge931 commented 2 months ago

The help says that the labels arg of scale_*_* etc can take a function that inputs the breaks, and returns something.

I am creating some functions that manipulate the labels based on the position of the element, and is does not always work as the help indicates that is should.

image

library(tidyverse)
library(palmerpenguins)

hold_3rd <- function(x) {
  c("", "", as.character(x[2]), rep("", times = length(x) - 3))
}

#sometimes works as expected
penguins |> 
  ggplot() +
  geom_point(
    aes(x = flipper_length_mm,
        y = body_mass_g),
  ) +
  scale_x_continuous(labels = \(x) hold_3rd(scales::comma(x)))
#> Warning: Removed 2 rows containing missing values or values outside the scale range
#> (`geom_point()`).


#sometimes does not
penguins |> 
  ggplot() +
  geom_point(
    aes(x = bill_length_mm,
        y = body_mass_g),
  ) +
  scale_x_continuous(labels = \(x) hold_3rd(scales::comma(x)))
#> Warning: Removed 2 rows containing missing values or values outside the scale range
#> (`geom_point()`).

Created on 2024-05-03 with reprex v2.1.0

teunbrand commented 2 months ago

This is because the labelling function is applied to breaks before the out-of-bounds breaks are censored. In your second example, you'd need to discard the out-of-bound breaks and it works as intended.

library(palmerpenguins)
library(ggplot2)
library(scales)

hold_3rd <- function(x) {
  x[-3] <- ""
  x
}

penguins |> 
  ggplot() +
  geom_point(
    aes(x = bill_length_mm,
        y = body_mass_g),
  ) +
  scale_x_continuous(
    labels = \(x) hold_3rd(comma(x)),
    breaks = \(x) oob_discard(extended_breaks()(x), x)
  )
#> Warning: Removed 2 rows containing missing values or values outside the scale range
#> (`geom_point()`).

Created on 2024-05-05 with reprex v2.1.0

davidhodge931 commented 2 months ago

Thanks @teunbrand. It'd be great to be able to do this with oob's other than oob_discard. Could the labelling function be applied to breaks after the out-of-bounds breaks are censored?

teunbrand commented 2 months ago

I imagine this would get hairy. If we discard oob breaks before labelling, labels given as atomic vectors will become out of sync. In addition, minor breaks might be miscalculated without oob breaks. I'll keep this issue open as a prompt to explore this more fully, but the answer for now is 'probably not'.

davidhodge931 commented 2 months ago

It would be useful, but don't want to break everything! Feel free to close whenever

davidhodge931 commented 2 months ago

The main use-case for this would be for a labelling function that labels every second break, and leaves every other one as "". It would work much more intuitively, if it always started from the first break within bounds

teunbrand commented 2 months ago

I have thought about this some more, and while we could implement this in ggplot2 without problems for ggplot2, this might unecessarily break other people's packages. Back during reverse dependency checks for 3.5.0, I came across a bunch of code in packages that made unorthodox* use of label functions that would break again if this were to be changed. For that reason, I don't want to change the way this works.

* = using lookup tables or returning fixed-length atomic vectors

Now, for your use case, I had forgotten that breaks arrive at the labelling function in a pre-censored state (i.e. oob breaks are NA). You can exploit this as follows. Similar to the reprex:

library(ggplot2)

only_show_nth <- function(n) {
  force(n)
  function(x) {
    i <- which(is.finite(x))
    x[-i[n]] <- ""
    x
  }
}

ggplot(mpg, aes(displ, hwy)) +
  geom_point() +
  scale_x_continuous(
    labels = only_show_nth(2)
  )

Similar to the use-case you describe:

show_every_nth <- function(n) {
  force(n)
  function(x) {
    i <- which(is.finite(x))
    i <- i[seq_along(i) %% n == 0]
    x[-i] <- ""
    x
  }
}

ggplot(mpg, aes(displ, hwy)) +
  geom_point() +
  scale_x_continuous(
    labels = show_every_nth(2)
  )

Created on 2024-05-08 with reprex v2.1.0

davidhodge931 commented 2 months ago

That's awesome, thanks @teunbrand.

Works as expected for positional scales, but not for colour scales?

Also, I assume you're not interested in putting an argument in the scales::label_* functions to support this?

library(tidyverse)

show_every_nth <- function(n = 2, offset = 0) {
  force(n)
  function(x) {
    i <- which(is.finite(x))
    i <- i[seq_along(i) %% n == (offset + 1)]
    x[-i] <- ""
    x
  }
}

ggplot(mpg, aes(displ, hwy, colour = displ)) +
  geom_point() +
  scale_x_continuous(labels = show_every_nth(2)) +
  scale_y_continuous(labels = show_every_nth(2)) +
  scale_colour_gradientn(colors = viridis::viridis(9), labels = show_every_nth())


ggplot(mpg, aes(displ, hwy, colour = hwy)) +
  geom_point() +
  scale_x_continuous(labels = show_every_nth(2)) +
  scale_y_continuous(labels = show_every_nth(2)) +
  scale_colour_gradientn(colors = viridis::viridis(9), labels = show_every_nth())

Created on 2024-05-09 with reprex v2.1.0

davidhodge931 commented 2 months ago

Instead of an argument in a scales::label_* function, it might work better as a function.

Let me know if you'd like to implement something like this in {scales}. Otherwise, I'll chuck it in {ggblanket}

label_every_nth <- function(n = 2, offset = 0, ...) {
  function(x) {
    i <- which(is.finite(x) | is.character(x) | is.factor(x) | is.logical(x))
    i <- i[seq_along(i) %% n == (offset + 1)]

    if (is.numeric(x)) x <- scales::comma(x, ...)
    else x <- format(x, ...)

    x[-i] <- ""
    x
  }
}