slowkow / ggrepel

:round_pushpin: Repel overlapping text labels away from each other in your ggplot2 figures.
https://ggrepel.slowkow.com
GNU General Public License v3.0
1.21k stars 95 forks source link

flexible point.padding #83

Open Yuanchao-Xu opened 7 years ago

Yuanchao-Xu commented 7 years ago

I have three groups of points with different radiuses. With fixed set of point.padding, the problem will look like below. The big circle will cover part of the label. But if increase point.padding, the small circle's label will be too far away. Is there any work around for this? Thanks. image

slowkow commented 7 years ago

Thank you for opening this issue and sharing a figure that demonstrates the problem.

If someone is willing to volunteer their time, I'd appreciate a pull request for this feature.


Here is a minimal code example that can be used to reproduce the issue. When there's an update that addresses this issue, we can re-run this code to see if it works as expected:

Problem

Large points are overlapping their text labels, so it's not possible to read the labels.

library(ggrepel)

d <- data.frame(
  x = c(1, 2, 2, 3),
  y = c(1, 2, 3, 1),
  pointsize = factor(c(1, 2, 2, 1)),
  label = sprintf("label%s", 1:4)
)

ggplot(d, aes(x, y)) +
  geom_point(aes(size = pointsize)) +
  scale_size_manual(values = c(1, 15)) +
  geom_text_repel(aes(label = label), size = 5)

image

Solution

We should be able to give a vector of values for point.padding, but ggrepel 0.6.12 does not support this feature.

ggplot(d, aes(x, y)) +
  geom_point(aes(size = pointsize)) +
  scale_size_manual(values = c(1, 15)) +
  geom_text_repel(
    aes(label = label),
    size = 5,
    point.padding = unit(c(1, 2, 2, 1), "lines")
  )
Error in repel_boxes(data_points = cbind(x$data$x, x$data$y), point_padding_x = point_padding_x,  : 
  Expecting a single value: [extent=4].
In addition: Warning messages:
1: Using size for a discrete variable is not advised. 
2: In if (is.na(x$point.padding)) { :
  the condition has length > 1 and only the first element will be used
railsfanatic commented 6 years ago

Could the padding arguments be allowed within the aes() argument or be set dynamically based on geom_point size?

slowkow commented 6 years ago

@railsfanatic I would be happy to review a pull request! I think this is a great feature to add, but it is not implemented right now.

slowkow commented 4 years ago

I need help implementing this feature. I don't know how to convert the point size to the appropriate size that is scaled to the plotting area.

Does anyone have any hints? Pull requests are welcome.

Here is what I have tried, but none of these options for setting point_size are correct. My best guess is that I am inappropriately calling convertWidth() and convertHeight(). I don't know how to fix it ☹️

https://github.com/slowkow/ggrepel/blob/f7925f032f827da8d09ad8b9d883551abb89551a/R/geom-text-repel.R#L418-L428

slowkow commented 4 years ago

@hadley Could I please ask if you might have any hints on how to fix this issue?

If you clone the latest GitHub version of ggrepel, I think you should be able to see similar behavior to the animated gif below.

I tried to set point_size based on the width of the plot area, but this causes some predictable issues:

Here's my incorrect code, and the gif that shows the problem caused by it:

point_size <- convertWidth(to_unit(x$data$point.size), "native", valueOnly = TRUE) / 14

ggrepel-issue83

The panel on the left assumes all points have size 0. The panel on the right shows the problem that I've described above.

slowkow commented 4 years ago

Here is the code for the figure in my gif:

library(ggrepel)

set.seed(42)
d <- data.frame(
  x = c(1, 2, 2, 3),
  y = c(1, 2, 3, 1),
  pointsize = c(0, 2, 1, 0),
  label = sprintf("label%s", 1:4)
)

# point.size = 0
p1 <- ggplot(d, aes(x, y)) +
  geom_point(aes(size = factor(pointsize)), color = "grey50") +
  scale_size_manual(values = c(1, 8, 15), guide = FALSE) +
  geom_text_repel(
    aes(label = label),
    # point.size = d$pointsize,
    size = 5
  ) +
  labs(title = "point.size = 0")

size_range <- c(2, 50)
p2 <- ggplot(d, aes(x, y)) +
  continuous_scale(
    aesthetics = c("size", "point.size"), scale_name = "point.size",
    palette = scales::area_pal(size_range), guide = FALSE
  ) +
  # scale_size_continuous(range = size_range, guide = FALSE) +
  geom_point(aes(size = pointsize), color = "grey50") +
  geom_text_repel(
    min.segment.length = 0.1,
    box.padding = 0.2,
    aes(label = label, point.size = pointsize),
    size = 5, max.iter = 1e5, max.time = 1
  )

gridExtra::grid.arrange(p1, p2, ncol = 2)
hadley commented 4 years ago

@thomasp85 is more likely to know the answer than me

pmur002 commented 4 years ago

Simple (and possibly silly) question to start with: why are you converting point size to "native" ? why not to something that measures the same in both height and width, like "inches" ?

slowkow commented 4 years ago

@pmur002 Honestly, I don't have a good understanding of how to use the convertWidth() and convertHeight() functions. All I know is that I need to use them at some point, because that's how I figure out how big the text boxes need to be.

There is a good chance that I have some very silly mistakes, and I'd be delighted if you can help me to see them or if you can help to fix them.

slowkow commented 4 years ago

I tried a few things.

Attempt 1

point_size <- convertWidth(to_unit(x$data$point.size), "native", valueOnly = TRUE) / 14

This works well only when the plot width equals the plot height. The good news is that the output is correct regardless of how large the plot is. The bad news is that the aspect ratio of the plot determines the size of each point. This means the point size is either too big or too small when the plot width does not equal the plot height.

I don't know how geom_point() maintains the the same size of each point independent of the plot dimensions, but that is exactly the behavior I'm trying to duplicate.

Attempt 2

point_size <- convertWidth(to_unit(x$data$point.size), "inches", valueOnly = TRUE) / 42

This works only for plots of very specific size, so 42 is hand-tuned to match that plot size. The output is wrong if the plot size is increased, decreased, or if the aspect ratio is changed.

slowkow commented 4 years ago

To my great surprise, this hacky code seems to be the best attempt yet!

I wish it weren't full of magic, though. And it needs more magic to fine-tune the distance between the segment and the edge of each large point.

  p_width <- convertWidth(unit(1, "npc"), "inch", TRUE)
  p_height <- convertHeight(unit(1, "npc"), "inch", TRUE)
  p_ratio <- (p_width / p_height)
  if (p_ratio > 1) {
    p_ratio <- p_ratio ^ (1 / (1.5 * p_ratio))
  }

  point_size <- p_ratio * convertWidth(
    to_unit(x$data$point.size), "native", valueOnly = TRUE
  ) / 13

ezgif com-optimize

pmur002 commented 4 years ago

My understanding is that the grobs that you create within your geom's draw_panel() method will be drawn within a 'grid' viewport with a 0:1 "native" coordinate system (and you use something like coord$transform(x, panel_scales) to convert data values to this "native" scale). However, that viewport you are drawing in also has all of the normal 'grid' coordinate systems as well, such as "mm", "inches", "npc", etc. It seems to me that the sort of calculations you are doing might be easier in something "absolute" like inches or mm, so that you can forget about problems like the aspect ratio of the plot region altogether. However, having had a brief look at your code, the use of "native" coordinates appears to go pretty deep, including into your C++ code, so a change of coordinate systems might be a very large undertaking. If you have found something that works, maybe stick with that :)

slowkow commented 4 years ago

Paul, thanks for the comment! If I understand you correctly, you are suggesting:

Is that right?

pmur002 commented 4 years ago

Pretty much. Though my comments are all theoretical - I have not written any code to back them up. Also, I believe it is only necessary to convert to "native" (via coord$transform()) IF you want locations that are relative to the scales on the plot axes.