wilkelab / ggridges

Ridgeline plots in ggplot2
https://wilkelab.org/ggridges
GNU General Public License v2.0
412 stars 31 forks source link

Feature Request: "Zoom in" on height axis for distribution tail visualizations #60

Open hwaight opened 4 years ago

hwaight commented 4 years ago

I'm trying to create a ridgeline plot where I display the full distribution and the tail of the distribution side-by-side. I am requesting a feature where you can "zoom in" on the tail of distributions without altering the estimated density function. You can currently do this for the x-axis using coord_cartesian, but it is not possible at the moment to do that on the height of each distribution. This feature would be really helpful for people who are working with large datasets which follow power law-like distributions in their tails and who want to visualize the extremes of their distributions. This is a common situation in working with text data, for example. I've created two examples below:

Single Density Example

library("gridExtra")

## iris data set, full distribution of viriginica Sepal.Width
plot1 <- ggplot(iris %>% filter(Species == "virginica"), 
                aes(x = Sepal.Width)) +
                 geom_density()  

plot2 <-  ggplot(iris %>% filter(Species == "virginica"), 
                 aes(x = Sepal.Width)) +
                   geom_density()  +
                  coord_cartesian(xlim = c(3.5, 3.8),
                                   ylim = c(0, .5))

grid.arrange(plot1, plot2, ncol = 2)

image

The figure shows that the tail of the density has not been re-normalized. It maintains the shape and area from the original figure, we've just zoomed in on both the x and y axis.

GG Ridges Examples

In this example I've added an additional aesthetic mapping, as that helps underscore why this would be helpful. I've created a binary variable to map to the fill aesthetic.

## binary variable 
iris$norm <- rnorm(150)
iris$norm_bin <- ifelse(iris$norm < 0,
                        "Less than 0",
                        "Greater than 0")

plot1 <- ggplot(iris,  aes(y = Species)) +
  geom_density_ridges(aes(x = Sepal.Width,
                          fill = norm_bin),
                      alpha = .5) +
  theme_ridges(grid = FALSE,
               center_axis_labels = TRUE) +
  theme(legend.position = "left",
        axis.title.y = element_blank()) + 
  ggtitle("Center of Distribution")

plot2 <- ggplot(iris,  aes(y = Species)) +
  geom_density_ridges(aes(x = Sepal.Width,
                          fill = norm_bin),
                      alpha = .5) +
  theme_ridges(grid = FALSE,
               center_axis_labels = TRUE) +
  theme(legend.position = "none",
        axis.title.y = element_blank()) + 
  coord_cartesian(xlim = c(4, 5)) +
  ggtitle("Tail of Distribution")

grid.arrange(plot1, plot2, ncol = 2)

image

With ggridges you can't zoom in on the height of each ridgeline, as coord_cartesian() only accepts x and y limits. Here it doesn't really matter because there are so few observations (so you can still see and make sense of the tails), but once we increase the observations as well as the number of ridgelines it becomes difficult.

If there could be a feature built that would allow you to zoom in on the "height" of each ridgeline it would be really helpful.

Thanks! And thanks for building such a fantastic package.