wilkelab / ggridges

Ridgeline plots in ggplot2
https://wilkelab.org/ggridges
GNU General Public License v2.0
411 stars 31 forks source link

Setting scale limits with quantile_lines can lead to misleading quantiles lines #28

Closed bjreisman closed 5 years ago

bjreisman commented 5 years ago

When scales are limited with scale_x_continuous(limits = c(min, max)), only the data within the limits is retained for the quantile calculation. This leads to plotted quantile lines which do not match the true data quantiles.

I think this is the expected behavior when setting limits inside the scale function, as opposed to zooming in on a part of the plot. See: stackoverflow thread

While I don't think this is a bug per-se, but it caught me off guard at first and it may be worthwhile to include a warning when trying to use limited axis with quantile lines. Normally I'd be happy to contribute, but I think it would require the quantile lines function to know whether the scales have been limited (or vice-versa) and I'm not sure how to go about that.

library(tidyverse)
library(cowplot)
library(ggridges)

free_x_plot <- 
  ggplot(iris, aes(x = Sepal.Length, y = Species)) +
  stat_density_ridges(quantile_lines = TRUE, quantiles = 2)

free_x_table <- 
  as_tibble(layer_data(free_x_plot)) %>%
  filter(datatype == "vline") %>%
  select(x, ymin)

limited_x_plot1 <-
  ggplot(iris, aes(x = Sepal.Length, y = Species)) +
  stat_density_ridges(quantile_lines = TRUE, quantiles = 2) +
  scale_x_continuous(limits = c(5, 9))

limited_x_table1 <- 
  as_tibble(layer_data(limited_x_plot1)) %>%
  filter(datatype == "vline") %>%
  select(x, ymin)

zoom_x_plot1 <-
  ggplot(iris, aes(x = Sepal.Length, y = Species)) +
  stat_density_ridges(quantile_lines = TRUE, quantiles = 2) +
  coord_cartesian(xlim = c(5, 9))

zoom_x_table1 <-
  as_tibble(layer_data(zoom_x_plot1)) %>%
  filter(datatype == "vline") %>%
  select(x, ymin)

limited_x_plot2 <-
  ggplot(iris, aes(x = Sepal.Length, y = Species)) +
  stat_density_ridges(quantile_lines = TRUE, quantiles = 2) +
  scale_x_continuous(limits = c(5.2, 9))

limited_x_table2 <- as_tibble(layer_data(limited_x_plot2)) %>%
  filter(datatype == "vline") %>%
  select(x, ymin)

zoom_x_plot2 <-
  ggplot(iris, aes(x = Sepal.Length, y = Species)) +
  stat_density_ridges(quantile_lines = TRUE, quantiles = 2) +
  coord_cartesian(xlim = c(5.2, 9))

zoom_x_table2 <-
  as_tibble(layer_data(zoom_x_plot1)) %>%
  filter(datatype == "vline") %>%
  select(x, ymin)
plot_grid(
  free_x_plot + labs(x = "Sepal.Length\nFree X"),
  limited_x_plot1 + labs(x = "Sepal.Length\nscale limits = c(5,9)"),
  limited_x_plot2 + labs(x = "Sepal.Length\nxscale limits = c(5.2,9)"),
  NULL,
  zoom_x_plot1 + labs(x = "Sepal.Length\nzoom xlim = c(5,9"),
  zoom_x_plot2 + labs(x = "Sepal.Length\nzoom xlim = c(5.2,9)"),
  nrow = 2
)

image

truth <- iris %>%
  group_by(Species) %>%
  summarise(Sepal.Length.median = median(Sepal.Length))

truth <- 
  iris %>%
  group_by(Species) %>%
  summarise(Sepal.Length.median = median(Sepal.Length))

data.frame(
  "y" = free_x_table$ymin,
  "Species" = truth$Species,
  "true.median" = truth$Sepal.Length.median,
  "free_x" = free_x_table$x,
  "limited_y1" = limited_x_table1$x,
  "limited_y2" = limited_x_table2$x,
  "zoom_y1" = zoom_x_table1$x,
  "zoom_y2" = zoom_x_table2$x
)
  y    Species true.median free_x limited_y1 limited_y2 zoom_y1 zoom_y2
1 1     setosa         5.0    5.0        5.1        5.4     5.0     5.0
2 2 versicolor         5.9    5.9        5.9        6.0     5.9     5.9
3 3  virginica         6.5    6.5        6.5        6.5     6.5     6.5
clauswilke commented 5 years ago

This is just how ggplot2 works. Limits on scales remove the data outside the limits. This usually generates a warning about missing values (see below). You'll have the same issue with all other geoms (e.g., geom_boxplot()).

library(ggplot2)
library(ggridges)

ggplot(iris, aes(x = Sepal.Length, y = Species)) +
  stat_density_ridges(quantile_lines = TRUE, quantiles = 2) +
  scale_x_continuous(limits = c(5, 9))
#> Picking joint bandwidth of 0.166
#> Warning: Removed 22 rows containing non-finite values
#> (stat_density_ridges).

Created on 2018-11-30 by the reprex package (v0.2.1)

bjreisman commented 5 years ago

Ah, thank you!