Continuous / gradient legend

grantmcdermott commented 4 months ago

Closes #84. Closes #124. Closes #130.

Some notes:

On the actual implementation side, I ended up going with a bespoke raster-based legend rather than trying to do some y-intersp based trickery. The latter ended up being more trouble than it was worth and this way we also get things like alpha transparency and "top!" and "bottom!" legend placements.
~I added a threshold for unique number of groups (default = 5) before the continuous legend kicks in. This can be over-ridden by the user (tpar("legend.ugc")). But it's my way to avoid something I personally find annoying about ggplot2's default behaviour, which automatically converts any numeric grouping variable into a gradient swatch, even if there are only (say) two categories.~

Quick examples. [UPDATED based on bug catches and feedback in thread below.]

pkgload::load_all("~/Documents/Projects/tinyplot")
#> ℹ Loading tinyplot

par(pch = 19, las = 1)

# default
plt(lat ~ long | depth, quakes, grid = TRUE)

# legend switch
plt(lat ~ long | depth, quakes, grid = TRUE, legend = "bottom!")

# color interpolation
plt(lat ~ long | depth, data = quakes, col = hcl.colors(20, palette = "rocket"))

# transparency
plt(
  lat ~ long | depth, quakes, grid = TRUE,
  palette = hcl.colors(palette = "rocket", alpha = 0.5)
)

# separate col and bg control
plt(
  lat ~ long | depth, quakes,
  grid = TRUE,
  palette = hcl.colors(palette = "rocket", alpha = 0.7),
  pch = 21, col = "white", bg = "by", cex = 2
)

^{Created on 2024-02-22 with reprex v2.1.0}

vincentarelbundock commented 4 months ago

Looks amazing!

I was only able to try the examples above (crazy week), but it looks excellent on my setup.

Unrelated, but could tpar() pass extra arguments to par() via ... so we can always all the same function regardless and don't have to mix and match?

grantmcdermott commented 4 months ago

Unrelated, but could tpar() pass extra arguments to par() via ... so we can always all the same function regardless and don't have to mix and match?

Ah thanks for the reminder. That's definitely a goal. Can you please file an issue so I remember to implement?

zeileis commented 4 months ago

This looks very cool. I'll play around some more with it on Friday.

grantmcdermott commented 4 months ago

Okay, this is ready to go for full review from my side. I updated the tests and documentation, and also fixed a few corner cases. One more example (gradient for point interiors, but white borders):

pkgload::load_all("~/Documents/Projects/tinyplot")
#> ℹ Loading tinyplot
plt(
  lat ~ long | depth, quakes,
  grid = TRUE,
  palette = hcl.colors(palette = "rocket", alpha = 0.7),
  pch = 21, col = "white", bg = "by"
)

^{Created on 2024-02-15 with reprex v2.1.0}

zeileis commented 4 months ago

Grant @grantmcdermott, thanks again for implementing this really nice feature. The examples you posted all look great. I started playing around with these examples (without looking at the internals, yet). I noticed a bug in handling the the bg argument for the fill color of pch in 21:25 and I think the handling of by variables with few unique values is too ad hoc. I'll post these below and continue to play around some more...

zeileis commented 4 months ago

The bug: The bg argument for setting the fill color is only handled correctly if bg = "by", e.g.,

tinyplot(lat ~ long | depth, data = quakes, pch = 21, bg = "by") ## ok

However, when set to a scalar color, it appears to be only used for the first level(s):

tinyplot(lat ~ long | depth, data = quakes, pch = 21, bg = "red") ## only the lowest level(s)

For discrete variables this works:

tinyplot(mpg ~ wt | gear, data = mtcars, pch = 21, bg = "red") ## ok

However, even for discrete variables it is not possible to set a vector of background colors (e.g., semi-transparent versions of the border color.

tinyplot(mpg ~ wt | gear, data = mtcars, pch = 21, col = p, bg = adjustcolor(p, 0.3)) ## Error in !is.null(bg) && bg == "by"

Additional idea: Should we support some sort of shortcut for the latter application? Rather than just having the same color via bg = "by" we could allow bg = 0.3 as a shortcut for adjustcolor(..., 0.3) applied to the by color.

zeileis commented 4 months ago

Palette/legend handling for numeric by variables:

I think that the current implementation is ad hoc and confusing. For example when you happen to draw different subsets of the same type of data:

set.seed(403)
tinyplot(mpg ~ wt | carb, data = mtcars, pch = 19, subset = sample(1:32, 20)) ## categorical
tinyplot(mpg ~ wt | carb, data = mtcars, pch = 19, subset = sample(1:32, 20)) ## continuous

Even more seriously, the palette may yield an error in one but not the other case:

set.seed(403)
v <- hcl.colors(100)
tinyplot(mpg ~ wt | carb, data = mtcars, pch = 19, subset = sample(1:32, 20), col = v) ## error
tinyplot(mpg ~ wt | carb, data = mtcars, pch = 19, subset = sample(1:32, 20), col = v) ## ok

And finally, the killer argument in my opinion is that we get a completely different handling when the discrete values are not equidistant. The categorical version just ignores the distances completely:

mtcars$score <- rep(c(0, 1, 99, 100), each = 8) ## discrete score with values 0, 1, 99, and 100
mtcars$score2 <- mtcars$score + sin(1:32)/100 ## same but with some "fuzz"
tinyplot(mpg ~ wt | score, data = mtcars, pch = 19, col = hcl.colors(4)) ## four equidistant categories
tinyplot(mpg ~ wt | score2, data = mtcars, pch = 19) ## continuous with two extreme ends of the scale

Conceptual considerations:

So I thought a bit more about what style of palette and corresponding legend I would expect for y ~ x | z with different types of z when both x and y are numeric.

Type of `z`	Palette	Legend	Status
`factor` (unordered)	Qualitative	Discrete	:heavy_check_mark:
`ordered` (inheriting from `factor`)	Sequential	Discrete	:exclamation:
`numeric` (with many levels)	Sequential	Continuous	:heavy_check_mark:
`numeric` (with few levels)	Sequential	:question:	:question:

So at the very least we should change the default palette for ordered factors.

And then the question remains whether we should have a special handling of numeric variables with few distinct levels. I would argue: No for the reasons listed above. It is simple enough to say y ~ x | ordered(z).

If you disagree with me here, then at least this case should be handled like an ordered factor (i.e., with a sequential palette) and not like an unordered factor. Also I would prefer to employ a palette that preserves distances between the values. This would also imply that for numeric z it is always ok to supply a col of length greater than the number of unique levels.

If all of this is incorporated then I think I could live with the discrete palette but would still think it's confusing. Also, there is no simple/intuitive argument to change the behavior because an extra call to tpar() is needed.

grantmcdermott commented 4 months ago

Wow, this is great feedback. Thanks @zeileis! I think that I agree with all of your major points. I won't be able to action anything immediately, since I'm heading out for a weekend at the coast. But I'll mull on some of your high level design decision ideas while I'm looking at the waves rolling in :-)

zeileis commented 4 months ago

🌊 🌊 🌊 Enjoy the waves and the weekend! There is still enough time to mull over design decisions afterwards...

P.S.: I have also thought about a default palette for this but haven't been able to write it up, yet.

zeileis commented 4 months ago

Default color palette for a continuous numeric by variable in a scatter plot:

As I had mentioned elsewhere before, this is not the same as choosing a sequential palette for a display based on shading areas (like a heatmap). In area-based displays you often want the "low" end of the palette to fade into the background. But here it is important that all points are clearly visible and that none of them fade into the background. Thus, we want:

The palette must have a certain luminance gradient (from dark to light) so that it is clearly perceived as sequential.
But colors must not become too light (i.e., luminance must not be too high).
And even the light colors must retain a certain colorfulness (i.e., a certain chroma).
To make up for the reduced ranges in luminance and chroma, we want to have a sufficient hue range.

Four suggestions:

I screened all of the sequential color palettes we have easily available in hcl.colors() and picked out the following four which satisfy the criteria above very well: Viridis, Mako (both from the Viridis family), ag_GrnYl, ag_Sunset (both from the CARTO family). There are also other palettes with similar properties but I personally liked these four best. All of them also perform reasonably well under color vision deficiencies.

For some of these it is necessary to restrict the range though, e.g., exclude some of the light (low-chroma high-luminance) and/or dark (low-chroma low-luminance) colors. Hence I have written the following convenience function that optionally clips a certain percentage of colors and then extends them to a 100 again. This is currently necessary because tinyplot insists on the 100. (As already commented above, I think that we should change this.) I reverse all the palettes so that the lighter colors correspond to lower numeric values.

hcl100 <- function(palette = "Viridis", from = 0, to = 1) {
  rev(colorRampPalette(hcl.colors(100, palette)[(100 * from + 1):(100 * to)])(100))
}

And then I defined the following four palettes:

pv <- hcl100("Viridis", 0.1, 0.9)
pm <- hcl100("Mako", 0.2, 0.9)
pg <- hcl100("ag_GrnYl", 0, 0.9)
ps <- hcl100("ag_Sunset", 0, 0.9)

Illustrations:

To see how well the palettes work in practice I used your example based on the quakes data and another example with fewer points using the iris data. For quakes I used the following code with different p's.

tinyplot(lat ~ long | depth, data = quakes, pch = 19, col = p)

tinyplot of quakes data with four different sequential palettes

tinyplot(Sepal.Length ~ Sepal.Width | Petal.Width, data = iris, pch = 19, col = p)

tinyplot of iris data with four different sequential palettes

Perceptual properties:

To better understand how the four palettes differ regarding their perceptual properties, I provide the corresponding ranges in hue, chroma, and luminance in the table below.

Palette	Hue range	Chroma range	Luminance range
`Viridis`	98-277	44-89	23-84
`Mako`	150-286	25-55	21-86
`ag_GrnYl`	101-225	27-79	34-85
`ag_Sunset`	275-414	69-106	25-79

Overall, even after restricting the Viridis palette, it still has the largest hue range, a large chroma range, and a large luminance range. Thus, the colors in this palette can be distinguished easily but they are also rather heterogeneous.
The other three palettes are all somewhat more homogeneous.
Mako has the biggest luminance contrast (light-dark) which makes the colors easy to distinguish but also gives them somewhat unequal weight.
The two CARTO palettes are most homogeneous.

My recommendation:

I think all four palettes would be feasible as default palettes for this purpose.

I would probably either go for Viridis because many readers are familiar with it already. Or possibly ag_Sunset because the colors are dark and colorful (low-luminance high-chroma) which makes them particularly suitable for points and lines.

vincentarelbundock commented 4 months ago

Thanks for taking the time to write this @zeileis . I know nothing about colors, so I'm learning a lot.

My opinion should not count for much here. I like all four options. Based on pure vibes and personal preferences, Sunset seems prettiest.

But again, all those options look great.

zeileis commented 4 months ago

Fair enough, thanks for the feedback, Vincent! 🌇

grantmcdermott commented 4 months ago

Okay, some updates:

[x] I fixed the bg scalar bug and added a test. Should be working fine now.
- [x] I didn't address the by = 0.3 alpha FR, but I think we should consider it separately. @zeileis do you mind opening a new issue/FR for it?
[x] Similarly, I fixed the ordered factor cases to inherit from the same default sequential palette as the numeric case (except, the legend will obvs be discrete instead of gradient).
[x] Incidentally, I realized from your examples that the label alignment wasn't looking great if the individual labels didn't have the same character width ("2" vs "2.5" etc.) This should be fixed now.

Still to do / decide:

[x] Decide on the default sequential palette. I must say, my own preference leans towards the adapted "Mako" palette. (So much for consensus!) Apart from idiosyncratic aesthetic preferences, my reasons include a concern that the adapted "Viridis" palette might fall into uncanny valley for users that are expecting to see the extreme parts of the swatch (e.g., yellow). Plus the pink+purples of the ag_Sunset palette are a little too neon for my own tastes... [Decision: Restricted viridis has the votes in the end.]
- [x] Open question 1: Should we automagically impose the same trimming function on all HCL color palettes that are passed to sequential cases? (At least, the keyword cases.) E.g., If a user passes palette = "rocket", should automatically trim the bottom and top percentage? [Decision: No]
- [x] Open question 2: Should these palettes run in the other direction? @zeileis , I see that your bespoke hcl100 function wraps rev() internally. But this leads to the light colors at the bottom of the gradient. I thought they generally ran the other way with light colors representing high values, no? [Decision: Use reversed colors.]
[x] Should we still support some kind of automatic exception logic for small n numeric cases, so that these instead take on discrete legends? I must admit that I'm a bit sad to see this drop, but @zeileis has convinced me that it shouldn't be the default case. (Too many potential inconsistencies, even though, say, generating a gradient legend from a two-level numeric continues to annoy me slightly.) In other words, is it worth keeping tpar(legend.ugc) or should we just drop it altogether? [Decision: No]
[x] Interpolate gradient colors if necessary.

grantmcdermott commented 4 months ago

P.S. A minor clarification: We don't require that gradient palettes take 100 levels, It's just that we use this scale internally to make color matching easier with the plotted points, etc. Among other things, this avoids having to allocate large vectors and, more importantly, makes it much easier ensure that the "pretty" labels of the gradient legend line up with the actual colors that they represent on the plot. This internal adjustment seems like a requirement, however, because of another constraint, which is that all user-supplied palette generating functions must take "n" as a leading argument. (See back here.) So your hcl adjustment function would work fine it you added this leading argument:

hcl_adj = function(n, palette = "Viridis", from = 0, to = 1) {
    colorRampPalette(hcl.colors(n, palette)[(n * from + 1):(n * to)])(n)
}
tinyplot(lat ~ long | depth, data = quakes, pch = 19, palette = hcl_adj(palette = "Mako", from = 0.1, to = 0.9) )

vincentarelbundock commented 4 months ago

Make is also beautiful.

@ASKurz just ran a twitter poll with 200+ votes and 75% of people thought that dark colors should represent the high end of the scale. FWIW, my intuition agrees with the majority. (Think of all the US population density maps with dark spots representing cities.)

https://twitter.com/SolomonKurz/status/1749559759215145471

zeileis commented 4 months ago

If you use the palette = argument, you can use a function, yes. But if you use the col = argument which (a) is the standard argument everyone knows from plot() and (b) avoids the non-standard non-standard evaluation, then you need 100 colors:

tinyplot(lat ~ long | depth, data = quakes, pch = 19, col = hcl.colors(20))
Error: `col` must be of length 1 or 100.

I think this is confusing (unnecessarily so). For categorical palettes, the requirement to exactly match the length makes sense. But for continuous palettes, I think we should allow any length and then use colorRampPalette() to suitably interpolate it.

zeileis commented 4 months ago

Feedback regarding the open to-do points:

Those who want the full Viridis palette can always get it easily. So I wouldn't be concerned about using a subset. As I summarized above, even the reduced palette still covers a broader range in the HCL dimensions compared to many other palettes.
Automagically imposing trimming: No, I wouldn't do that. I would just use it in case of our default.
What is high and what is low? This depends on the context. The folklore is that on a white background the dark colors should stand out as extreme - while on dark/black background the light colors should represent the extreme. As the factory-fresh default is a white background, dark colors should be extreme. And usually extreme means large. So our default should be a reversed hcl.colors palette.
Exception for small n: Nothing good will come from this IMHO.

ASKurz commented 4 months ago

Although I personally was very disappointed by the results of that poll, the people have spoken. Dark colors should represent the high end of the scale. sigh

grantmcdermott commented 4 months ago

@zeileis @vincentarelbundock Okay... I believe that all of the outstanding issues should now be addressed. I ended up going with the adjusted viridis palette as the default after all, and any other changes should be in line with your suggestions. (See the updated examples right the top of the thread for some illustrations.)

Please kick the tires once more to check that you're happy. Assuming that everything looks good to you, please feel free to squash and merge.

zeileis commented 4 months ago

Thanks, Grant, for the thorough update! I played around with it and noticed that my recommendation of reversing the scale has led to some inconsistencies. Sorry about that!

First, I noticed that the case of 100 colors is handled differently from other settings:

tinyplot(lat ~ long | depth, data = quakes, pch = 19, col = hcl.colors(99, "ag_Sunset")) ## low = light
tinyplot(lat ~ long | depth, data = quakes, pch = 19, col = hcl.colors(100, "ag_Sunset")) ## low = dark
tinyplot(lat ~ long | depth, data = quakes, pch = 19, col = hcl.colors(101, "ag_Sunset")) ## low = light

I guess that this might be due to reversing the order in two different places in the code?

Also, the default ordering is different for ordered factors vs. numeric variables, e.g.,

tinyplot(mpg ~ wt | carb, data = mtcars, pch = 19, cex = 1.5) ## low = light, bottom to top
tinyplot(mpg ~ wt | ordered(carb), data = mtcars, pch = 19, cex = 1.5) ## low = dark, top to bottom

Maybe we also want to use the same restricted viridis palette for the ordered factors?

P.S.: Given that you were already kind enough to list me with an "aut" role for the package, you never need to thank me in the NEWS. :-)

grantmcdermott commented 4 months ago

Also, the default ordering is different for ordered factors vs. numeric variables, e.g.,
tinyplot(mpg ~ wt | carb, data = mtcars, pch = 19, cex = 1.5) ## low = light, bottom to top
tinyplot(mpg ~ wt | ordered(carb), data = mtcars, pch = 19, cex = 1.5) ## low = dark, top to bottom
Maybe we also want to use the same restricted viridis palette for the ordered factors?

Quick clarification/confirmation on this: We can certainly match the restricted colors for ordered factors and ensure that low values = dark. But do we want the legend to be reversed and run from bottom to top too?

I understand that it will be better for internal consistency, but we are deviating from established norms in other packages. Both ggplot2 and lattice run ordered factors from top to bottom, e.g. lattice::xyplot(mpg ~ wt, group = ordered(carb), data = mtcars, auto.key = TRUE)

zeileis commented 4 months ago

Good point. So we can either be consistent within tinyplot for numeric and ordered - or we can be consistent across packages with ggplot2 and lattice. Then let's go with the consistency with ggplot2 and lattice - and let's see how users like it. Given that ordered factors are typically under-used anyway, there are probably not many users affect by this.

grantmcdermott commented 4 months ago

@zeileis Thanks for confirming (and for catching these cases). Both should be fixed now:

pkgload::load_all("~/Documents/Projects/tinyplot")
#> ℹ Loading tinyplot

tinyplot(lat ~ long | depth, data = quakes, pch = 19, col = hcl.colors(100, "ag_Sunset"))

tinyplot(mpg ~ wt | ordered(carb), data = mtcars, pch = 19, cex = 1.5)

^{Created on 2024-02-27 with reprex v2.1.0}

grantmcdermott commented 4 months ago

Is there anything we still need to do/check before merging?

zeileis commented 4 months ago

Thanks for the reminder, Grant. I'm just traveling home from a conference and didn't have time, yet, to play with the code. I just noticed one last inconsistency that I wanted to mention. But as I explain below, I think that this is the best solution we can do. So I wouldn't change anything.

Compare:

tinyplot(mpg ~ wt | carb,          data = mtcars, pch = 19, cex = 1.5, col = hcl.colors(6, "ag_Sunset"))
tinyplot(mpg ~ wt | ordered(carb), data = mtcars, pch = 19, cex = 1.5, col = hcl.colors(6, "ag_Sunset"))
tinyplot(mpg ~ wt | factor(carb),  data = mtcars, pch = 19, cex = 1.5, col = hcl.colors(6, "ag_Sunset"))

In the numeric case we reverse the order to obtain "dark = high". But in the ordered and factor case we don't reverse the order so that dark = low. While this is somewhat inconsistent, I think this is the best we can do. I just wanted to point out why - so that you can check whether you agree with these considerations or whether you would prefer a different solution.

In the unordered factor case, we clearly don't want to re-order. The order of the colors should simply match the order of the categories.
In the numeric case, we might disable the reversing and just do it for the default palette. However, that would mean that users would very have to say something like hcl.colors(..., rev = TRUE) which would be rather inconvenient. So I would also leave this as it is.
So then we have to decide whether the ordered case should reorder (like numeric) or not reorder (like factor). I think the latter is probably less confusing.

grantmcdermott commented 4 months ago

So then we have to decide whether the ordered case should reorder (like numeric) or not reorder (like factor). I think the latter is probably less confusing.

Hmmm. Yes, I think you're right that this is the "least bad" tradeoff that we can make here. And users can always use rev = TRUE if they want to switch the ordering. Let's leave it as-is for now and we can potentially adjust if we get strong feedback about it.

Thanks for the reminder, Grant. I'm just traveling home from a conference and didn't have time, yet, to play with the code.

Sorry, I don't mean to be a rash, I was mostly checking in, since I realised that my last message was probably a bit ambiguous. I just pushed another small commit now, but that should be it from me unless you pick up any more issues in testing. Catching these edge cases is important, so take your time... although it would be great if we could merge this PR fairly soon, since that will clear the way for the last few things before CRAN submission ;-) I'm hoping to submit before I head out for an extended vacation around spring break.

Let me know!

zeileis commented 4 months ago

I agree, good plan. 2. No worries! 3. I think we can squash and merge now. Should I press the button?

grantmcdermott commented 4 months ago

If you're happy then I'll go ahead and do it. Thanks again for all the super helpful comments on this one!

zeileis commented 4 months ago

Thank you for doing all the actual hard work!!

grantmcdermott / tinyplot

Continuous / gradient legend #122