slowkow / ggrepel

:round_pushpin: Repel overlapping text labels away from each other in your ggplot2 figures.
https://ggrepel.slowkow.com
GNU General Public License v3.0
1.22k stars 96 forks source link

Variable pointsize #251

Open teunbrand opened 9 months ago

teunbrand commented 9 months ago

This PR aims o fix #83.

Briefly, it takes Paul's advice here to cast everything to an absolute unit as soon as possible, and then refuse to deal with other types of units ever again (unless absolutely needed). This should make the segments attach nicely to the normally sized circle point (shapes 1, 16, 19, 21, but not 20). This is also stable upon resizing the device window.

Reprex from https://github.com/slowkow/ggrepel/issues/83#issuecomment-536123752:

devtools::load_all("~/packages/ggrepel")
#> ℹ Loading ggrepel
#> Loading required package: ggplot2

set.seed(42)
d <- data.frame(
  x = c(1, 2, 2, 3),
  y = c(1, 2, 3, 1),
  pointsize = c(0, 2, 1, 0),
  label = sprintf("label%s", 1:4)
)

p <- ggplot(d, aes(x, y)) +
  geom_text_repel(
    min.segment.length = 0.1,
    box.padding = 0.2,
    aes(label = label, point.size = pointsize),
    size = 5, max.iter = 1e5, max.time = 1
  ) +
  continuous_scale(
    aesthetics = c("size", "point.size"), 
    palette = scales::area_pal(c(2, 50)), guide = "none"
  )

p + geom_point(aes(size = pointsize), color = "grey50")

Shape = 20 has a smaller diameter, so the segment doesn't attach for that shape.

p + geom_point(aes(size = pointsize), color = "grey50", shape = 20)

Created on 2024-01-17 with reprex v2.1.0

I had to update 1 test that didn't draw any segments by default. Visual tests needed updating due to tiny changes in location of some of the labels, usually something like 0.01 unit in a scale of 100s. I presume these are rounding errors and otherwise harmless.

Please have a go and play around a bit yourself as I hadn't considered every option under the sun for this.

slowkow commented 9 months ago

Some of the examples failed to build after these changes, so I had to do some fiddling with NA and converting to units.

After I fixed that, the forces seem to be a bit off now. Some examples still look good, but others have too little force so the labels are squished together or on top of each other.

Playing with units is harder than I thought. But I'll keep trying and update here when I feel like I have some good results, because I think your solution makes the segments the perfect length.

teunbrand commented 9 months ago

Thanks for the heads up and trying out various options! Could you point me to a specific example that would be broken with this PR? I could try and have a go at preventing any serious damage.

slowkow commented 9 months ago

The variable point size example is perfect! it's incredibly satisfying to see the segment lengths perfectly calculated. 💯

But there are a few examples where my arbitrary choice of force magnitude is now a bit off after switching the unit of measurement.

Example 1

Here is one example that used to look good, but now it is squished:

set.seed(42)
ggplot(mtcars, aes(x = wt, y = 1, label = rownames(mtcars))) +
  geom_point(color = "red") +
  geom_text_repel(
    force_pull   = 0, # do not pull toward data points
    nudge_y      = 0.05,
    direction    = "x",
    angle        = 90,
    hjust        = 0,
    segment.size = 0.2,
    max.iter = 1e4, max.time = 1
  ) +
  xlim(1, 6) +
  ylim(1, 0.8) +
  theme(
    axis.line.y  = element_blank(),
    axis.ticks.y = element_blank(),
    axis.text.y  = element_blank(),
    axis.title.y = element_blank()
  )

Example 2

Here is another one where the result looks worse than it did before, maybe the number of iterations might be too low with the new units:

set.seed(42)
mtcars$label <- rownames(mtcars)
mtcars$label[mtcars$mpg < 25] <- ""
ggplot(mtcars, aes(x = wt, y = mpg, color = factor(cyl), label = label)) +
  coord_polar(theta = "x") +
  geom_point(size = 2) +
  scale_color_discrete(name = "cyl") +
  geom_text_repel(show.legend = FALSE) + # Don't display "a" in the legend.
  theme_bw(base_size = 18)

Possible approaches

I can think of two approaches to try to address this. The first should probably be trivial to implement (but I failed, so maybe not). The second is a lot more work but probably more robust. There are probably more approaches that I have not yet considered.

Approach 1

The first approach is to figure out what scaling factors to change. For example, I tried fiddling with these two numbers:

https://github.com/slowkow/ggrepel/blob/114458562428c1eae7ee4b207e31157cb3bea5da/R/geom-text-repel.R#L438-L439

After some playing, I could not perfectly restore the old behavior in ggrepel 0.9.4. I tried changing 1e-6 to 1e-5 and 1e-4, and I think 1e-4 was pretty good, but still a bit off.

Maybe a better approach is to multiply all coordinates by some scaling factor, and leave these two numbers as-is? I don't know if that will work.

Maybe there are smarter ways of choosing values for force magnitudes? I don't know, but I bet someone has done this in the past.

Approach 2

The second approach is to modify the positioning algorithm so it is more robust. Some time after I figured out how to use a physical repulsion/attraction simulation to position labels, I found a totally different approach in d3 voronoi-labels. I like the Voronoi algorithm, because it is simpler and does not depend on arbitrary numbers for force calculations. I wish I could go back in time and do it that way from the start, but this has been a learning journey for me, so ggrepel has a physical simulation instead.

Some day, I might consider changing to a Voronoi algorithm in a new version of ggrepel. It may be possible to keep the old simulation algorithm around for backwards-compatibility if people like the way it looks. But a newer algorithm using Voronoi calculations will probably be more efficient and robust. Perhaps some physical simulation fine-touches can improve label placement after the initial Voronoi placement, and I bet the arbitrary choices of force magnitudes will be less important when we have nice initial positions.

Also, I realize this PR is supposed to be about segment length, so I understand that the Voronoi idea is a bit much — I just wanted to share the ideas bouncing around in my head. Feel free to ignore! Just a thought.

teunbrand commented 9 months ago

Thanks for the examples Kamil. I agree with you that these look worse than before.

Knowing nothing about the internals of the attraction/repulsion simulation underneath, I think the correct multiplier should be the width or height, as everything is now scales from [0, 1] to [0, width] and [0, height]?

Maybe a better approach is to multiply all coordinates by some scaling factor, and leave these two numbers as-is? I don't know if that will work.

In theory you should be able to divide all x's by width and y's by height and you should have the calculation similar to before when everything was in native units. You'd also have to re-multiply after the repelling algorithm I suppose.

The voronoi algorithm sounds nice, but I think rather than it being the algorithm for label positioning, I think it is a good initialiser (instead of random jitter, I suppose). For example in those big voronoi cells it doesn't make sense to place the label so far away when there is plenty of space near the point itself.

EDIT: Totally read over the remark suggesting something similar 😅

Perhaps some physical simulation fine-touches can improve label placement after the initial Voronoi placement,

slowkow commented 9 months ago

If you want to give the scaling a shot, I'd appreciate it! 😃

Otherwise, I will try to come back to this sometime in the future and see if I can get the scaling close enough so that people don't complain about any changes in performance. I feel like I really messed up by choosing "native" without understanding what it meant at the time. My current understanding is that "native" forces width and height to both be in the range 0-1 instead of allowing width range be different than height range.

Anyway, thank you so much for the discussion and the help! There's always so much to learn.