slowkow / ggrepel

:round_pushpin: Repel overlapping text labels away from each other in your ggplot2 figures.
https://ggrepel.slowkow.com
GNU General Public License v3.0
1.21k stars 95 forks source link

geom_text_repel(direction = "y") does not honor hjust #190

Open twest820 opened 3 years ago

twest820 commented 3 years ago

Summary

ggrepel does not implement the design behavior described in #188.

Minimal code example

library(dplyr)
library(ggplot2)
library(ggrepel)
library(tidyr)
data = crossing(x = c(0, 1), slope = 1/2^seq(0,5)) %>% mutate(y = slope * x)
set.seed(0)
ggplot() + geom_line(data = data, aes(x = x, y = y, color = as.factor(slope), group = slope)) +
  geom_text_repel(data = data %>% group_by(slope) %>% slice_max(x, n = 1), aes(x = x, y = y, label = slope), direction = "y", hjust = -0.25) +
  coord_cartesian(xlim = c(0, 1.1)) + labs(color = "slope") +
  theme(legend.justification = c(0, 1), legend.position = c(0.02, 0.98))

This should result in tidy left justification of the labels just to the right of the ends of the lines. Instead, the labels wander left and right from line to line. image

Suggestions

Bug fix. Looks like the passed in value of hjust is getting overridden.

Version information

R version 4.0.4 (2021-02-15) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19041)

Matrix products: default

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] tidyr_1.1.3 ggrepel_0.9.1 ggplot2_3.3.3 dplyr_1.0.5

loaded via a namespace (and not attached): [1] Rcpp_1.0.6 magrittr_2.0.1 tidyselect_1.1.0 munsell_0.5.0 colorspace_2.0-0 R6_2.5.0
[7] rlang_0.4.10 fansi_0.4.2 tools_4.0.4 grid_4.0.4 gtable_0.3.0 utf8_1.1.4
[13] cli_2.3.1 DBI_1.1.1 withr_2.4.1 ellipsis_0.3.1 digest_0.6.27 assertthat_0.2.1 [19] tibble_3.1.0 lifecycle_1.0.0 crayon_1.4.1 farver_2.1.0 purrr_0.3.4 vctrs_0.3.6
[25] glue_1.4.2 labeling_0.4.2 compiler_4.0.4 pillar_1.5.1 generics_0.1.0 scales_1.1.1
[31] pkgconfig_2.0.3

slowkow commented 3 years ago

The reprex below shows that the behavior for ggplot2::geom_text() with negative values for hjust looks pretty similar to the behavior for ggrepel::geom_text_repel()...

I'm not sure what to think of this. I might need to read the source code to see what ggplot2::geom_text() does with the hjust option.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(ggplot2)
library(ggrepel)
library(tidyr)
library(patchwork)

data <- crossing(x = c(0, 1), slope = 1 / 2^seq(0, 5)) %>% mutate(y = slope * x)
set.seed(0)

p <- ggplot() +
  aes(x = x, y = y, label = slope) +
  geom_line(data = data, aes(color = as.factor(slope), group = slope)) +
  coord_cartesian(xlim = c(0, 1.1)) +
  labs(color = "slope") +
  theme(legend.justification = c(0, 1), legend.position = c(0.02, 0.98))

p0 <- p + geom_text(
  data = data %>% group_by(slope) %>% slice_max(x, n = 1),
  hjust = 0
) + labs(title = "geom_text() hjust = 0")

p1 <- p + geom_text(
  data = data %>% group_by(slope) %>% slice_max(x, n = 1),
  hjust = 1
) + labs(title = "geom_text() hjust = 1")

p5 <- p + geom_text(
  data = data %>% group_by(slope) %>% slice_max(x, n = 1),
  hjust = 0.5
) + labs(title = "geom_text() hjust = 0.5")

pn5 <- p + geom_text(
  data = data %>% group_by(slope) %>% slice_max(x, n = 1),
  hjust = -0.5
) + labs(title = "geom_text() hjust = -0.5")

q0 <- p + geom_text_repel(
  data = data %>% group_by(slope) %>% slice_max(x, n = 1),
  direction = "y",
  hjust = 0,
  xlim = c(NA, Inf)
) + labs(title = "geom_text_repel() hjust = 0")

q1 <- p + geom_text_repel(
  data = data %>% group_by(slope) %>% slice_max(x, n = 1),
  direction = "y",
  hjust = 1,
  xlim = c(NA, Inf)
) + labs(title = "geom_text_repel() hjust = 1")

q5 <- p + geom_text_repel(
  data = data %>% group_by(slope) %>% slice_max(x, n = 1),
  direction = "y",
  hjust = 0.5,
  xlim = c(NA, Inf)
) + labs(title = "geom_text_repel() hjust = 0.5")

qn5 <- p + geom_text_repel(
  data = data %>% group_by(slope) %>% slice_max(x, n = 1),
  direction = "y",
  hjust = -0.5,
  xlim = c(NA, Inf)
) + labs(title = "geom_text_repel() hjust = -0.5")

p0 + q0

p1 + q1

p5 + q5

pn5 + qn5

Created on 2021-04-07 by the reprex package (v0.3.0)

Session info ``` r devtools::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.0.3 (2020-10-10) #> os macOS Catalina 10.15.7 #> system x86_64, darwin17.0 #> ui X11 #> language (EN) #> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz America/New_York #> date 2021-04-07 #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date lib source #> assertthat 0.2.1 2019-03-21 [2] CRAN (R 4.0.2) #> blob 1.2.1 2020-01-20 [2] CRAN (R 4.0.2) #> callr 3.5.1 2020-10-13 [2] CRAN (R 4.0.2) #> cli 2.3.1 2021-02-23 [1] CRAN (R 4.0.3) #> colorspace 2.0-0 2020-11-11 [2] CRAN (R 4.0.2) #> crayon 1.4.1 2021-02-08 [1] CRAN (R 4.0.2) #> curl 4.3 2019-12-02 [2] CRAN (R 4.0.1) #> DBI 1.1.0 2019-12-15 [2] CRAN (R 4.0.2) #> debugme 1.1.0 2017-10-22 [1] CRAN (R 4.0.2) #> desc 1.2.0 2018-05-01 [2] CRAN (R 4.0.2) #> devtools 2.3.0 2020-04-10 [2] CRAN (R 4.0.2) #> digest 0.6.27 2020-10-24 [2] CRAN (R 4.0.2) #> dplyr * 1.0.4 2021-02-02 [1] CRAN (R 4.0.2) #> ellipsis 0.3.1 2020-05-15 [2] CRAN (R 4.0.2) #> evaluate 0.14 2019-05-28 [2] CRAN (R 4.0.1) #> fansi 0.4.2 2021-01-15 [1] CRAN (R 4.0.2) #> farver 2.0.3 2020-01-16 [2] CRAN (R 4.0.2) #> fs 1.5.0 2020-07-31 [1] CRAN (R 4.0.2) #> generics 0.1.0 2020-10-31 [2] CRAN (R 4.0.2) #> ggplot2 * 3.3.2.9000 2020-12-08 [2] Github (tidyverse/ggplot2@b5cc4d6) #> ggrepel * 0.9.1.9999 2021-01-22 [1] local #> glue 1.4.2 2020-08-27 [2] CRAN (R 4.0.2) #> gtable 0.3.0 2019-03-25 [2] CRAN (R 4.0.2) #> highr 0.8 2019-03-20 [2] CRAN (R 4.0.2) #> htmltools 0.5.1.1 2021-01-22 [1] CRAN (R 4.0.2) #> httr 1.4.2 2020-07-20 [2] CRAN (R 4.0.2) #> knitr 1.31 2021-01-27 [1] CRAN (R 4.0.2) #> labeling 0.4.2 2020-10-20 [2] CRAN (R 4.0.2) #> lifecycle 1.0.0 2021-02-15 [1] CRAN (R 4.0.2) #> magrittr 2.0.1.9000 2020-12-15 [1] Github (tidyverse/magrittr@bb1c86a) #> memoise 1.1.0.9000 2020-12-15 [1] Github (r-lib/memoise@0901e3f) #> mime 0.10 2021-02-13 [1] CRAN (R 4.0.2) #> munsell 0.5.0 2018-06-12 [2] CRAN (R 4.0.2) #> patchwork * 1.1.0 2020-11-09 [2] CRAN (R 4.0.2) #> pillar 1.5.0 2021-02-22 [1] CRAN (R 4.0.3) #> pkgbuild 1.1.0 2020-07-13 [2] CRAN (R 4.0.2) #> pkgconfig 2.0.3 2019-09-22 [2] CRAN (R 4.0.2) #> pkgload 1.1.0 2020-05-29 [2] CRAN (R 4.0.2) #> prettyunits 1.1.1 2020-01-24 [2] CRAN (R 4.0.2) #> processx 3.4.5 2020-11-30 [2] CRAN (R 4.0.2) #> ps 1.5.0 2020-12-05 [2] CRAN (R 4.0.2) #> purrr 0.3.4 2020-04-17 [2] CRAN (R 4.0.2) #> R6 2.5.0 2020-10-28 [2] CRAN (R 4.0.2) #> Rcpp 1.0.6 2021-01-15 [1] CRAN (R 4.0.2) #> remotes 2.2.0 2020-07-21 [1] CRAN (R 4.0.2) #> rlang 0.4.10 2020-12-30 [1] CRAN (R 4.0.2) #> rmarkdown 2.6 2020-12-14 [1] CRAN (R 4.0.2) #> rprojroot 2.0.2 2020-11-15 [2] CRAN (R 4.0.2) #> scales 1.1.1 2020-05-11 [2] CRAN (R 4.0.2) #> sessioninfo 1.1.1 2018-11-05 [2] CRAN (R 4.0.2) #> stringi 1.5.3 2020-09-09 [2] CRAN (R 4.0.2) #> stringr 1.4.0 2019-02-10 [2] CRAN (R 4.0.2) #> testthat 3.0.0 2020-10-31 [2] CRAN (R 4.0.2) #> tibble 3.0.6 2021-01-29 [1] CRAN (R 4.0.2) #> tidyr * 1.1.2 2020-08-27 [2] CRAN (R 4.0.2) #> tidyselect 1.1.0 2020-05-11 [2] CRAN (R 4.0.2) #> usethis 1.6.1 2020-04-29 [2] CRAN (R 4.0.2) #> utf8 1.1.4 2018-05-24 [2] CRAN (R 4.0.2) #> vctrs 0.3.6 2020-12-17 [1] CRAN (R 4.0.2) #> withr 2.4.1 2021-01-26 [1] CRAN (R 4.0.2) #> xfun 0.21 2021-02-10 [1] CRAN (R 4.0.2) #> xml2 1.3.2 2020-04-23 [2] CRAN (R 4.0.2) #> yaml 2.2.1 2020-02-01 [2] CRAN (R 4.0.2) #> #> [1] /Users/kamil/Library/R/4.0/library #> [2] /Library/Frameworks/R.framework/Versions/4.0/Resources/library ```
twest820 commented 3 years ago

Thanks for the quick response. Sorry, that was a careless copy/paste on my part as I'd meant to put hjust = 0 in the reprex. The side by side hjust = 0 case you have is a better illustration anyway; it's like geom_text_repel() ends up triggering a position_nudge(x = -something) where probably what one would want to do would be geom_text_repel(direction = "y", hjust = 0, position_nudge(x = 0.01)) in order to get left justified text with clean spacing from the ends of the lines but without vertical encroachment or overlap between text instances. (A negative hjust is functionally equivalent to position_nudge(x) in the special case where labels are all of the same length.)

There may well be some limitations on the ggplot side (e.g. https://github.com/tidyverse/ggplot2/issues/4401).

Something else which seems worth noting is the movement of labels whose vjust doesn't need altering from vjust = 0.5, such as for slopes 1 and 0.5. I'm also finding that, for example, the slope = 1, 0.5, and 0.25 labels will switch between something like vjust = -0.5 and vjust = 1.5 if plotting is repeated even though set.seed() has been called.

slowkow commented 3 years ago

OK, I think I get it.

We can disable the physical repulsion simulation with max.iter=0, instructing the function to run 0 iterations of the repulsion simulation. This should show us where the text is placed before anything moves.

q0 <- p + geom_text_repel(
  data = data %>% group_by(slope) %>% slice_max(x, n = 1),
  direction = "y",
  hjust = 0,
  xlim = c(NA, Inf),
  max.iter = 0
) + labs(title = "geom_text_repel() hjust = 0")

The figure below is showing what you mentioned in your comment: the initial starting position is shifted a bit to the left for geom_text_repel() relative to to geom_text().

p0 + q0

This is caused by the default box.padding = 0.25 option, which is supposed to give a bit of padding to the left, top, bottom, and right sides of each text label's bounding box.

If we set box.padding=0 then we can recover the correct behavior:

p0 + p + geom_text_repel(
  data = data %>% group_by(slope) %>% slice_max(x, n = 1),
  direction = "y",
  hjust = 0,
  xlim = c(NA, Inf),
  max.iter = 0,
  box.padding = 0
) + labs(title = "geom_text_repel() hjust = 0")

So, I think you are correct to point out that the ggrepel code has a bug in the way that it is accounting for the box.padding.

Thank you for reporting this!

I hope you can use nudge_x to work around this issue.

twest820 commented 3 years ago

Thanks for the hints! There are some interesting possibilities here accessible by controlling how far labels move with box.padding and max.iter and using nudge_x to compensate for the leftward drift of labels (example below).

I think the logical endpoint of the approach would be something like direction = "x+ y+-" so that labels in dense areas can move to the right and develop leader lines where needed without forcing the use of leader in sparse areas (the current direction semantics being x+- y+- for "both", x+- for "x", and y+- for "y").

library(cowplot)
library(dplyr)
library(ggplot2)
library(ggrepel)
library(tidyr)

data = crossing(x = c(0, 1), slope = 1/2^seq(0, 6)) %>% mutate(y = slope * x)
plot_grid(ggplot() + geom_line(data = data, aes(x = x, y = y, color = as.factor(slope), group = slope)) +
            geom_text(data = data %>% group_by(slope) %>% slice_max(x, n = 1), aes(x = x, y = y, label = slope), hjust = 0, position = position_nudge(x = 0.01), vjust = 0.5) +
            coord_cartesian(xlim = c(0, 1.42)) + labs(color = "slope", title = "geom_text()") +
            theme(legend.justification = c(0, 1), legend.position = c(0.02, 0.99)),
          ggplot() + geom_line(data = data, aes(x = x, y = y, color = as.factor(slope), group = slope)) +
            geom_text_repel(data = data %>% group_by(slope) %>% slice_max(x, n = 1), aes(x = x, y = y, label = slope), box.padding = 0.1, direction = "y", hjust = 0, max.iter = 5, nudge_x = 0.1, xlim = c(0, 1.5)) +
            coord_cartesian(xlim = c(0, 1.42)) + labs(color = "slope", title = "geom_text_repel()") +
            theme(legend.position = "none"),
          nrow = 1, ncol = 2)

image