grantmcdermott / tinyplot

Lightweight extension of the base R graphics system
https://grantmcdermott.com/tinyplot
Apache License 2.0
204 stars 7 forks source link

Continuous / gradient legend #122

Closed grantmcdermott closed 4 months ago

grantmcdermott commented 4 months ago

Closes #84. Closes #124. Closes #130.

Some notes:

Quick examples. [UPDATED based on bug catches and feedback in thread below.]

pkgload::load_all("~/Documents/Projects/tinyplot")
#> ℹ Loading tinyplot

par(pch = 19, las = 1)

# default
plt(lat ~ long | depth, quakes, grid = TRUE)

# legend switch
plt(lat ~ long | depth, quakes, grid = TRUE, legend = "bottom!")

# color interpolation
plt(lat ~ long | depth, data = quakes, col = hcl.colors(20, palette = "rocket"))

# transparency
plt(
  lat ~ long | depth, quakes, grid = TRUE,
  palette = hcl.colors(palette = "rocket", alpha = 0.5)
)

# separate col and bg control
plt(
  lat ~ long | depth, quakes,
  grid = TRUE,
  palette = hcl.colors(palette = "rocket", alpha = 0.7),
  pch = 21, col = "white", bg = "by", cex = 2
)

Created on 2024-02-22 with reprex v2.1.0

vincentarelbundock commented 4 months ago

Looks amazing!

I was only able to try the examples above (crazy week), but it looks excellent on my setup.

Unrelated, but could tpar() pass extra arguments to par() via ... so we can always all the same function regardless and don't have to mix and match?

grantmcdermott commented 4 months ago

Unrelated, but could tpar() pass extra arguments to par() via ... so we can always all the same function regardless and don't have to mix and match?

Ah thanks for the reminder. That's definitely a goal. Can you please file an issue so I remember to implement?

zeileis commented 4 months ago

This looks very cool. I'll play around some more with it on Friday.

grantmcdermott commented 4 months ago

Okay, this is ready to go for full review from my side. I updated the tests and documentation, and also fixed a few corner cases. One more example (gradient for point interiors, but white borders):

pkgload::load_all("~/Documents/Projects/tinyplot")
#> ℹ Loading tinyplot
plt(
  lat ~ long | depth, quakes,
  grid = TRUE,
  palette = hcl.colors(palette = "rocket", alpha = 0.7),
  pch = 21, col = "white", bg = "by"
)

Created on 2024-02-15 with reprex v2.1.0

zeileis commented 4 months ago

Grant @grantmcdermott, thanks again for implementing this really nice feature. The examples you posted all look great. I started playing around with these examples (without looking at the internals, yet). I noticed a bug in handling the the bg argument for the fill color of pch in 21:25 and I think the handling of by variables with few unique values is too ad hoc. I'll post these below and continue to play around some more...

zeileis commented 4 months ago

The bug: The bg argument for setting the fill color is only handled correctly if bg = "by", e.g.,

tinyplot(lat ~ long | depth, data = quakes, pch = 21, bg = "by") ## ok

However, when set to a scalar color, it appears to be only used for the first level(s):

tinyplot(lat ~ long | depth, data = quakes, pch = 21, bg = "red") ## only the lowest level(s)

For discrete variables this works:

tinyplot(mpg ~ wt | gear, data = mtcars, pch = 21, bg = "red") ## ok

However, even for discrete variables it is not possible to set a vector of background colors (e.g., semi-transparent versions of the border color.

tinyplot(mpg ~ wt | gear, data = mtcars, pch = 21, col = p, bg = adjustcolor(p, 0.3)) ## Error in !is.null(bg) && bg == "by"

Additional idea: Should we support some sort of shortcut for the latter application? Rather than just having the same color via bg = "by" we could allow bg = 0.3 as a shortcut for adjustcolor(..., 0.3) applied to the by color.

zeileis commented 4 months ago

Palette/legend handling for numeric by variables:

I think that the current implementation is ad hoc and confusing. For example when you happen to draw different subsets of the same type of data:

set.seed(403)
tinyplot(mpg ~ wt | carb, data = mtcars, pch = 19, subset = sample(1:32, 20)) ## categorical
tinyplot(mpg ~ wt | carb, data = mtcars, pch = 19, subset = sample(1:32, 20)) ## continuous

Even more seriously, the palette may yield an error in one but not the other case:

set.seed(403)
v <- hcl.colors(100)
tinyplot(mpg ~ wt | carb, data = mtcars, pch = 19, subset = sample(1:32, 20), col = v) ## error
tinyplot(mpg ~ wt | carb, data = mtcars, pch = 19, subset = sample(1:32, 20), col = v) ## ok

And finally, the killer argument in my opinion is that we get a completely different handling when the discrete values are not equidistant. The categorical version just ignores the distances completely:

mtcars$score <- rep(c(0, 1, 99, 100), each = 8) ## discrete score with values 0, 1, 99, and 100
mtcars$score2 <- mtcars$score + sin(1:32)/100 ## same but with some "fuzz"
tinyplot(mpg ~ wt | score, data = mtcars, pch = 19, col = hcl.colors(4)) ## four equidistant categories
tinyplot(mpg ~ wt | score2, data = mtcars, pch = 19) ## continuous with two extreme ends of the scale

Conceptual considerations:

So I thought a bit more about what style of palette and corresponding legend I would expect for y ~ x | z with different types of z when both x and y are numeric.

Type of z Palette Legend Status
factor (unordered) Qualitative Discrete :heavy_check_mark:
ordered (inheriting from factor) Sequential Discrete :exclamation:
numeric (with many levels) Sequential Continuous :heavy_check_mark:
numeric (with few levels) Sequential :question: :question:

So at the very least we should change the default palette for ordered factors.

And then the question remains whether we should have a special handling of numeric variables with few distinct levels. I would argue: No for the reasons listed above. It is simple enough to say y ~ x | ordered(z).

If you disagree with me here, then at least this case should be handled like an ordered factor (i.e., with a sequential palette) and not like an unordered factor. Also I would prefer to employ a palette that preserves distances between the values. This would also imply that for numeric z it is always ok to supply a col of length greater than the number of unique levels.

If all of this is incorporated then I think I could live with the discrete palette but would still think it's confusing. Also, there is no simple/intuitive argument to change the behavior because an extra call to tpar() is needed.

grantmcdermott commented 4 months ago

Wow, this is great feedback. Thanks @zeileis! I think that I agree with all of your major points. I won't be able to action anything immediately, since I'm heading out for a weekend at the coast. But I'll mull on some of your high level design decision ideas while I'm looking at the waves rolling in :-)

zeileis commented 4 months ago

🌊 🌊 🌊 Enjoy the waves and the weekend! There is still enough time to mull over design decisions afterwards...

P.S.: I have also thought about a default palette for this but haven't been able to write it up, yet.

zeileis commented 4 months ago

Default color palette for a continuous numeric by variable in a scatter plot:

As I had mentioned elsewhere before, this is not the same as choosing a sequential palette for a display based on shading areas (like a heatmap). In area-based displays you often want the "low" end of the palette to fade into the background. But here it is important that all points are clearly visible and that none of them fade into the background. Thus, we want:

Four suggestions:

I screened all of the sequential color palettes we have easily available in hcl.colors() and picked out the following four which satisfy the criteria above very well: Viridis, Mako (both from the Viridis family), ag_GrnYl, ag_Sunset (both from the CARTO family). There are also other palettes with similar properties but I personally liked these four best. All of them also perform reasonably well under color vision deficiencies.

For some of these it is necessary to restrict the range though, e.g., exclude some of the light (low-chroma high-luminance) and/or dark (low-chroma low-luminance) colors. Hence I have written the following convenience function that optionally clips a certain percentage of colors and then extends them to a 100 again. This is currently necessary because tinyplot insists on the 100. (As already commented above, I think that we should change this.) I reverse all the palettes so that the lighter colors correspond to lower numeric values.

hcl100 <- function(palette = "Viridis", from = 0, to = 1) {
  rev(colorRampPalette(hcl.colors(100, palette)[(100 * from + 1):(100 * to)])(100))
}

And then I defined the following four palettes:

pv <- hcl100("Viridis", 0.1, 0.9)
pm <- hcl100("Mako", 0.2, 0.9)
pg <- hcl100("ag_GrnYl", 0, 0.9)
ps <- hcl100("ag_Sunset", 0, 0.9)

Illustrations:

To see how well the palettes work in practice I used your example based on the quakes data and another example with fewer points using the iris data. For quakes I used the following code with different p's.

tinyplot(lat ~ long | depth, data = quakes, pch = 19, col = p)

tinyplot of quakes data with four different sequential palettes

tinyplot(Sepal.Length ~ Sepal.Width | Petal.Width, data = iris, pch = 19, col = p)

tinyplot of iris data with four different sequential palettes

Perceptual properties:

To better understand how the four palettes differ regarding their perceptual properties, I provide the corresponding ranges in hue, chroma, and luminance in the table below.

Palette Hue range Chroma range Luminance range
Viridis 98-277 44-89 23-84
Mako 150-286 25-55 21-86
ag_GrnYl 101-225 27-79 34-85
ag_Sunset 275-414 69-106 25-79

My recommendation:

I think all four palettes would be feasible as default palettes for this purpose.

I would probably either go for Viridis because many readers are familiar with it already. Or possibly ag_Sunset because the colors are dark and colorful (low-luminance high-chroma) which makes them particularly suitable for points and lines.

vincentarelbundock commented 4 months ago

Thanks for taking the time to write this @zeileis . I know nothing about colors, so I'm learning a lot.

My opinion should not count for much here. I like all four options. Based on pure vibes and personal preferences, Sunset seems prettiest.

But again, all those options look great.

zeileis commented 4 months ago

Fair enough, thanks for the feedback, Vincent! 🌇

grantmcdermott commented 4 months ago

Okay, some updates:

Still to do / decide:

grantmcdermott commented 4 months ago

P.S. A minor clarification: We don't require that gradient palettes take 100 levels, It's just that we use this scale internally to make color matching easier with the plotted points, etc. Among other things, this avoids having to allocate large vectors and, more importantly, makes it much easier ensure that the "pretty" labels of the gradient legend line up with the actual colors that they represent on the plot. This internal adjustment seems like a requirement, however, because of another constraint, which is that all user-supplied palette generating functions must take "n" as a leading argument. (See back here.) So your hcl adjustment function would work fine it you added this leading argument:

hcl_adj = function(n, palette = "Viridis", from = 0, to = 1) {
    colorRampPalette(hcl.colors(n, palette)[(n * from + 1):(n * to)])(n)
}
tinyplot(lat ~ long | depth, data = quakes, pch = 19, palette = hcl_adj(palette = "Mako", from = 0.1, to = 0.9) )
vincentarelbundock commented 4 months ago

Make is also beautiful.

@ASKurz just ran a twitter poll with 200+ votes and 75% of people thought that dark colors should represent the high end of the scale. FWIW, my intuition agrees with the majority. (Think of all the US population density maps with dark spots representing cities.)

https://twitter.com/SolomonKurz/status/1749559759215145471

zeileis commented 4 months ago

If you use the palette = argument, you can use a function, yes. But if you use the col = argument which (a) is the standard argument everyone knows from plot() and (b) avoids the non-standard non-standard evaluation, then you need 100 colors:

tinyplot(lat ~ long | depth, data = quakes, pch = 19, col = hcl.colors(20))
Error: `col` must be of length 1 or 100.

I think this is confusing (unnecessarily so). For categorical palettes, the requirement to exactly match the length makes sense. But for continuous palettes, I think we should allow any length and then use colorRampPalette() to suitably interpolate it.

zeileis commented 4 months ago

Feedback regarding the open to-do points:

ASKurz commented 4 months ago

Although I personally was very disappointed by the results of that poll, the people have spoken. Dark colors should represent the high end of the scale. sigh

grantmcdermott commented 4 months ago

@zeileis @vincentarelbundock Okay... I believe that all of the outstanding issues should now be addressed. I ended up going with the adjusted viridis palette as the default after all, and any other changes should be in line with your suggestions. (See the updated examples right the top of the thread for some illustrations.)

Please kick the tires once more to check that you're happy. Assuming that everything looks good to you, please feel free to squash and merge.

zeileis commented 4 months ago

Thanks, Grant, for the thorough update! I played around with it and noticed that my recommendation of reversing the scale has led to some inconsistencies. Sorry about that!

First, I noticed that the case of 100 colors is handled differently from other settings:

tinyplot(lat ~ long | depth, data = quakes, pch = 19, col = hcl.colors(99, "ag_Sunset")) ## low = light
tinyplot(lat ~ long | depth, data = quakes, pch = 19, col = hcl.colors(100, "ag_Sunset")) ## low = dark
tinyplot(lat ~ long | depth, data = quakes, pch = 19, col = hcl.colors(101, "ag_Sunset")) ## low = light

I guess that this might be due to reversing the order in two different places in the code?

Also, the default ordering is different for ordered factors vs. numeric variables, e.g.,

tinyplot(mpg ~ wt | carb, data = mtcars, pch = 19, cex = 1.5) ## low = light, bottom to top
tinyplot(mpg ~ wt | ordered(carb), data = mtcars, pch = 19, cex = 1.5) ## low = dark, top to bottom

Maybe we also want to use the same restricted viridis palette for the ordered factors?

P.S.: Given that you were already kind enough to list me with an "aut" role for the package, you never need to thank me in the NEWS. :-)

grantmcdermott commented 4 months ago

Also, the default ordering is different for ordered factors vs. numeric variables, e.g.,

tinyplot(mpg ~ wt | carb, data = mtcars, pch = 19, cex = 1.5) ## low = light, bottom to top
tinyplot(mpg ~ wt | ordered(carb), data = mtcars, pch = 19, cex = 1.5) ## low = dark, top to bottom

Maybe we also want to use the same restricted viridis palette for the ordered factors?

Quick clarification/confirmation on this: We can certainly match the restricted colors for ordered factors and ensure that low values = dark. But do we want the legend to be reversed and run from bottom to top too?

I understand that it will be better for internal consistency, but we are deviating from established norms in other packages. Both ggplot2 and lattice run ordered factors from top to bottom, e.g. lattice::xyplot(mpg ~ wt, group = ordered(carb), data = mtcars, auto.key = TRUE)

zeileis commented 4 months ago

Good point. So we can either be consistent within tinyplot for numeric and ordered - or we can be consistent across packages with ggplot2 and lattice. Then let's go with the consistency with ggplot2 and lattice - and let's see how users like it. Given that ordered factors are typically under-used anyway, there are probably not many users affect by this.

grantmcdermott commented 4 months ago

@zeileis Thanks for confirming (and for catching these cases). Both should be fixed now:

pkgload::load_all("~/Documents/Projects/tinyplot")
#> ℹ Loading tinyplot

tinyplot(lat ~ long | depth, data = quakes, pch = 19, col = hcl.colors(100, "ag_Sunset"))

tinyplot(mpg ~ wt | ordered(carb), data = mtcars, pch = 19, cex = 1.5)

Created on 2024-02-27 with reprex v2.1.0

grantmcdermott commented 4 months ago

Is there anything we still need to do/check before merging?

zeileis commented 4 months ago

Thanks for the reminder, Grant. I'm just traveling home from a conference and didn't have time, yet, to play with the code. I just noticed one last inconsistency that I wanted to mention. But as I explain below, I think that this is the best solution we can do. So I wouldn't change anything.

Compare:

tinyplot(mpg ~ wt | carb,          data = mtcars, pch = 19, cex = 1.5, col = hcl.colors(6, "ag_Sunset"))
tinyplot(mpg ~ wt | ordered(carb), data = mtcars, pch = 19, cex = 1.5, col = hcl.colors(6, "ag_Sunset"))
tinyplot(mpg ~ wt | factor(carb),  data = mtcars, pch = 19, cex = 1.5, col = hcl.colors(6, "ag_Sunset"))

In the numeric case we reverse the order to obtain "dark = high". But in the ordered and factor case we don't reverse the order so that dark = low. While this is somewhat inconsistent, I think this is the best we can do. I just wanted to point out why - so that you can check whether you agree with these considerations or whether you would prefer a different solution.

grantmcdermott commented 4 months ago

So then we have to decide whether the ordered case should reorder (like numeric) or not reorder (like factor). I think the latter is probably less confusing.

Hmmm. Yes, I think you're right that this is the "least bad" tradeoff that we can make here. And users can always use rev = TRUE if they want to switch the ordering. Let's leave it as-is for now and we can potentially adjust if we get strong feedback about it.

Thanks for the reminder, Grant. I'm just traveling home from a conference and didn't have time, yet, to play with the code.

Sorry, I don't mean to be a rash, I was mostly checking in, since I realised that my last message was probably a bit ambiguous. I just pushed another small commit now, but that should be it from me unless you pick up any more issues in testing. Catching these edge cases is important, so take your time... although it would be great if we could merge this PR fairly soon, since that will clear the way for the last few things before CRAN submission ;-) I'm hoping to submit before I head out for an extended vacation around spring break.

Let me know!

zeileis commented 4 months ago
  1. I agree, good plan. 2. No worries! 3. I think we can squash and merge now. Should I press the button?
grantmcdermott commented 4 months ago
  1. If you're happy then I'll go ahead and do it. Thanks again for all the super helpful comments on this one!
zeileis commented 4 months ago

Thank you for doing all the actual hard work!!