Closed mschubert closed 8 months ago
Do you mean to manually prioritize some labels over others when they cannot all be plotted without overlaps?
Yes, that's another way of putting it
Could I please ask if you might consider sharing a minimal code example to demonstrate the issue?
Also, you might like to explore the functions in the ggpp package. Maybe they are helpful in your situation?
Sure! So let's say I have a point cloud (grey), for which I want to label the outliers and specific points (black):
library(ggplot2)
syms = c(letters, LETTERS, 0:9)
labs = do.call(paste0, expand.grid(syms, syms))
dset = data.frame(x=rnorm(1e4), y=rnorm(1e4), label=sample(labs, 1e4, replace=TRUE))
ggplot(dset, aes(x=x, y=y)) +
geom_point(aes(color=label %in% c("aA", "bB", "cC"))) +
scale_color_manual(values=c("TRUE"="black", "FALSE"="grey")) +
ggrepel::geom_text_repel(aes(label=label), max.overlaps=3)
I would like to label the outliers as they are now and in addition label the black points. And do this is one call to geom_text_repel
, because otherwise some labels may be nudged into the others.
Thanks for pointing me to ggpp
. I see that stat_dens2d_filter
does something similar, and perhaps it would be better suited there (but I don't think it's currently able to achieve this).
OK, I think I might understand your request.
Would you consider a workaround with 2 calls to geom_text_repel()
(one for the grey dots and one for the black dots)?
Could I ask if you have a suggestion for how to implement your desired feature?
geom_text_repel(aes(label = label), max.overlaps = ???)
For example, would you want the value passed to max.overlaps
to be a vector of numbers (one for each label) instead of a single number?
I think implementing a way of doing this in stat_dens2d_filter()
would be easy, but an interface that is consistent with the grammar of graphics needs some thought. I would think a formal parameter protect
taking a vector that can be used as indexes (or subscripts) either logical
or integer
could work, but would be atypical for the grammar of graphics. So a function that does a test based on the label text, passed as argument to protect
would be best I think. Any suggestions or thoughts?
I think the best approach would accepting a function or a character vector of labels. So, if we want to protect say very few labels we would pass a vector of label texts, in other cases a user defined function using grepl()
or grep()
could be passed. However, in your example you seem to be willing for labels to overlap dots, and this would be certainly the case for the protected labels, you would anyway need multiple layers in the figure, one for all the points, (one for the labelled points if you want to highlight them) and one for the labels. (In your example some of the black dots are occluded by the grey ones and not visible.)
I will try to implement something like this in the next version of 'ggpp', as it seems generally useful.
Thanks a lot for your answers!
Would you consider a workaround with 2 calls to geom_text_repel() (one for the grey dots and one for the black dots)?
That's what I did in the past, but sometimes the new labels will be pushed over the old labels. So this is unfortunately not a good solution.
Could I ask if you have a suggestion for how to implement your desired feature?
I could see (1) the vector of max.overlaps
that you are suggesting, or (2) an additional argument for which points to ignore the overlaps (e.g. ignore.overlaps
, which may be a logical vector or a function).
I think implementing a way of doing this in
stat_dens2d_filter()
would be easy
I'm starting to lean towards addressing this in stat_dens2d_filter()
because the function is already applying a geom to a subset of the data, which is the same class of problem that my use case is about.
@mschubert @slowkow With future 'ggpp' 0.5.1 or the current GitHub version of 'ggpp', the plot could be created as shown below. In this case the second example is the simplest, but in a function one can use grep()
or grepl()
. Not exemplified is the use of numeric or logical vectors as arguments to keep.these
.
library(ggplot2)
library(ggpp)
#>
#> Attaching package: 'ggpp'
#> The following object is masked from 'package:ggplot2':
#>
#> annotate
library(ggrepel)
syms = c(letters, LETTERS, 0:9)
labs = do.call(paste0, expand.grid(syms, syms))
dset = data.frame(x=rnorm(1e4), y=rnorm(1e4), label=sample(labs, 1e4, replace=TRUE))
ggplot(dset, aes(x=x, y=y, label = label)) +
geom_point(colour = "grey") +
stat_dens2d_filter(geom = "text_repel",
position = position_nudge_centre(x = 0.1, y = 0.1, direction = "radial"),
keep.number = 50,
keep.these = function(x) {x %in% c("aA", "bB", "cC")},
min.segment.length = 0) +
theme_bw()
ggplot(dset, aes(x=x, y=y, label = label)) +
geom_point(colour = "grey") +
stat_dens2d_filter(geom = "text_repel",
position = position_nudge_centre(x = 0.1, y = 0.1, direction = "radial"),
keep.number = 50,
keep.these = c("aA", "bB", "cC"),
min.segment.length = 0) +
theme_bw()
Created on 2023-01-20 with reprex v2.0.2
@aphalo Amazing!
@slowkow Thanks! The code in 'ggpp' is dead simple compared to the repulsion code in 'ggrepel' but it does seem to help quite a lot in some cases.
It happens to me fairly often that I have a cloud of points, for which I want to label
It would be nice if
ggrepel
could support drawing labels within a region that surpassesmax.overlaps
, but are of particular interest. A way to implement this could be to pass an additional argument that specifies which points to draw, irrespective of whether they reside in a dense region.