Closed grantmcdermott closed 4 months ago
Looks amazing!
I was only able to try the examples above (crazy week), but it looks excellent on my setup.
Unrelated, but could tpar()
pass extra arguments to par()
via ...
so we can always all the same function regardless and don't have to mix and match?
Unrelated, but could tpar() pass extra arguments to par() via ... so we can always all the same function regardless and don't have to mix and match?
Ah thanks for the reminder. That's definitely a goal. Can you please file an issue so I remember to implement?
This looks very cool. I'll play around some more with it on Friday.
Okay, this is ready to go for full review from my side. I updated the tests and documentation, and also fixed a few corner cases. One more example (gradient for point interiors, but white borders):
pkgload::load_all("~/Documents/Projects/tinyplot")
#> ℹ Loading tinyplot
plt(
lat ~ long | depth, quakes,
grid = TRUE,
palette = hcl.colors(palette = "rocket", alpha = 0.7),
pch = 21, col = "white", bg = "by"
)
Created on 2024-02-15 with reprex v2.1.0
Grant @grantmcdermott, thanks again for implementing this really nice feature. The examples you posted all look great. I started playing around with these examples (without looking at the internals, yet). I noticed a bug in handling the the bg
argument for the fill color of pch
in 21:25
and I think the handling of by
variables with few unique values is too ad hoc. I'll post these below and continue to play around some more...
The bug: The bg
argument for setting the fill color is only handled correctly if bg = "by"
, e.g.,
tinyplot(lat ~ long | depth, data = quakes, pch = 21, bg = "by") ## ok
However, when set to a scalar color, it appears to be only used for the first level(s):
tinyplot(lat ~ long | depth, data = quakes, pch = 21, bg = "red") ## only the lowest level(s)
For discrete variables this works:
tinyplot(mpg ~ wt | gear, data = mtcars, pch = 21, bg = "red") ## ok
However, even for discrete variables it is not possible to set a vector of background colors (e.g., semi-transparent versions of the border color.
tinyplot(mpg ~ wt | gear, data = mtcars, pch = 21, col = p, bg = adjustcolor(p, 0.3)) ## Error in !is.null(bg) && bg == "by"
Additional idea: Should we support some sort of shortcut for the latter application? Rather than just having the same color via bg = "by"
we could allow bg = 0.3
as a shortcut for adjustcolor(..., 0.3)
applied to the by
color.
Palette/legend handling for numeric by variables:
I think that the current implementation is ad hoc and confusing. For example when you happen to draw different subsets of the same type of data:
set.seed(403)
tinyplot(mpg ~ wt | carb, data = mtcars, pch = 19, subset = sample(1:32, 20)) ## categorical
tinyplot(mpg ~ wt | carb, data = mtcars, pch = 19, subset = sample(1:32, 20)) ## continuous
Even more seriously, the palette may yield an error in one but not the other case:
set.seed(403)
v <- hcl.colors(100)
tinyplot(mpg ~ wt | carb, data = mtcars, pch = 19, subset = sample(1:32, 20), col = v) ## error
tinyplot(mpg ~ wt | carb, data = mtcars, pch = 19, subset = sample(1:32, 20), col = v) ## ok
And finally, the killer argument in my opinion is that we get a completely different handling when the discrete values are not equidistant. The categorical version just ignores the distances completely:
mtcars$score <- rep(c(0, 1, 99, 100), each = 8) ## discrete score with values 0, 1, 99, and 100
mtcars$score2 <- mtcars$score + sin(1:32)/100 ## same but with some "fuzz"
tinyplot(mpg ~ wt | score, data = mtcars, pch = 19, col = hcl.colors(4)) ## four equidistant categories
tinyplot(mpg ~ wt | score2, data = mtcars, pch = 19) ## continuous with two extreme ends of the scale
Conceptual considerations:
So I thought a bit more about what style of palette and corresponding legend I would expect for y ~ x | z
with different types of z
when both x
and y
are numeric.
Type of z |
Palette | Legend | Status |
---|---|---|---|
factor (unordered) |
Qualitative | Discrete | :heavy_check_mark: |
ordered (inheriting from factor ) |
Sequential | Discrete | :exclamation: |
numeric (with many levels) |
Sequential | Continuous | :heavy_check_mark: |
numeric (with few levels) |
Sequential | :question: | :question: |
So at the very least we should change the default palette for ordered factors.
And then the question remains whether we should have a special handling of numeric
variables with few distinct levels. I would argue: No for the reasons listed above. It is simple enough to say y ~ x | ordered(z)
.
If you disagree with me here, then at least this case should be handled like an ordered factor (i.e., with a sequential palette) and not like an unordered factor. Also I would prefer to employ a palette that preserves distances between the values. This would also imply that for numeric z
it is always ok to supply a col
of length greater than the number of unique levels.
If all of this is incorporated then I think I could live with the discrete palette but would still think it's confusing. Also, there is no simple/intuitive argument to change the behavior because an extra call to tpar()
is needed.
Wow, this is great feedback. Thanks @zeileis! I think that I agree with all of your major points. I won't be able to action anything immediately, since I'm heading out for a weekend at the coast. But I'll mull on some of your high level design decision ideas while I'm looking at the waves rolling in :-)
🌊 🌊 🌊 Enjoy the waves and the weekend! There is still enough time to mull over design decisions afterwards...
P.S.: I have also thought about a default palette for this but haven't been able to write it up, yet.
Default color palette for a continuous numeric by
variable in a scatter plot:
As I had mentioned elsewhere before, this is not the same as choosing a sequential palette for a display based on shading areas (like a heatmap). In area-based displays you often want the "low" end of the palette to fade into the background. But here it is important that all points are clearly visible and that none of them fade into the background. Thus, we want:
Four suggestions:
I screened all of the sequential color palettes we have easily available in hcl.colors()
and picked out the following four which satisfy the criteria above very well: Viridis
, Mako
(both from the Viridis family), ag_GrnYl
, ag_Sunset
(both from the CARTO family). There are also other palettes with similar properties but I personally liked these four best. All of them also perform reasonably well under color vision deficiencies.
For some of these it is necessary to restrict the range though, e.g., exclude some of the light (low-chroma high-luminance) and/or dark (low-chroma low-luminance) colors. Hence I have written the following convenience function that optionally clips a certain percentage of colors and then extends them to a 100 again. This is currently necessary because tinyplot
insists on the 100. (As already commented above, I think that we should change this.) I reverse all the palettes so that the lighter colors correspond to lower numeric values.
hcl100 <- function(palette = "Viridis", from = 0, to = 1) {
rev(colorRampPalette(hcl.colors(100, palette)[(100 * from + 1):(100 * to)])(100))
}
And then I defined the following four palettes:
pv <- hcl100("Viridis", 0.1, 0.9)
pm <- hcl100("Mako", 0.2, 0.9)
pg <- hcl100("ag_GrnYl", 0, 0.9)
ps <- hcl100("ag_Sunset", 0, 0.9)
Illustrations:
To see how well the palettes work in practice I used your example based on the quakes
data and another example with fewer points using the iris
data. For quakes
I used the following code with different p
's.
tinyplot(lat ~ long | depth, data = quakes, pch = 19, col = p)
tinyplot(Sepal.Length ~ Sepal.Width | Petal.Width, data = iris, pch = 19, col = p)
Perceptual properties:
To better understand how the four palettes differ regarding their perceptual properties, I provide the corresponding ranges in hue, chroma, and luminance in the table below.
Palette | Hue range | Chroma range | Luminance range |
---|---|---|---|
Viridis |
98-277 | 44-89 | 23-84 |
Mako |
150-286 | 25-55 | 21-86 |
ag_GrnYl |
101-225 | 27-79 | 34-85 |
ag_Sunset |
275-414 | 69-106 | 25-79 |
My recommendation:
I think all four palettes would be feasible as default palettes for this purpose.
I would probably either go for Viridis because many readers are familiar with it already. Or possibly ag_Sunset because the colors are dark and colorful (low-luminance high-chroma) which makes them particularly suitable for points and lines.
Thanks for taking the time to write this @zeileis . I know nothing about colors, so I'm learning a lot.
My opinion should not count for much here. I like all four options. Based on pure vibes and personal preferences, Sunset seems prettiest.
But again, all those options look great.
Fair enough, thanks for the feedback, Vincent! 🌇
Okay, some updates:
bg
scalar bug and added a test. Should be working fine now.
by = 0.3
alpha FR, but I think we should consider it separately. @zeileis do you mind opening a new issue/FR for it?ordered
factor cases to inherit from the same default sequential palette as the numeric case (except, the legend will obvs be discrete instead of gradient).Still to do / decide:
[x] Decide on the default sequential palette. I must say, my own preference leans towards the adapted "Mako" palette. (So much for consensus!) Apart from idiosyncratic aesthetic preferences, my reasons include a concern that the adapted "Viridis" palette might fall into uncanny valley for users that are expecting to see the extreme parts of the swatch (e.g., yellow). Plus the pink+purples of the ag_Sunset
palette are a little too neon for my own tastes... [Decision: Restricted viridis has the votes in the end.]
palette = "rocket"
, should automatically trim the bottom and top percentage? [Decision: No]hcl100
function wraps rev()
internally. But this leads to the light colors at the bottom of the gradient. I thought they generally ran the other way with light colors representing high values, no? [Decision: Use reversed colors.][x] Should we still support some kind of automatic exception logic for small n numeric cases, so that these instead take on discrete legends? I must admit that I'm a bit sad to see this drop, but @zeileis has convinced me that it shouldn't be the default case. (Too many potential inconsistencies, even though, say, generating a gradient legend from a two-level numeric continues to annoy me slightly.) In other words, is it worth keeping tpar(legend.ugc)
or should we just drop it altogether? [Decision: No]
[x] Interpolate gradient colors if necessary.
P.S. A minor clarification: We don't require that gradient palettes take 100 levels, It's just that we use this scale internally to make color matching easier with the plotted points, etc. Among other things, this avoids having to allocate large vectors and, more importantly, makes it much easier ensure that the "pretty" labels of the gradient legend line up with the actual colors that they represent on the plot. This internal adjustment seems like a requirement, however, because of another constraint, which is that all user-supplied palette generating functions must take "n" as a leading argument. (See back here.) So your hcl adjustment function would work fine it you added this leading argument:
hcl_adj = function(n, palette = "Viridis", from = 0, to = 1) {
colorRampPalette(hcl.colors(n, palette)[(n * from + 1):(n * to)])(n)
}
tinyplot(lat ~ long | depth, data = quakes, pch = 19, palette = hcl_adj(palette = "Mako", from = 0.1, to = 0.9) )
Make is also beautiful.
@ASKurz just ran a twitter poll with 200+ votes and 75% of people thought that dark colors should represent the high end of the scale. FWIW, my intuition agrees with the majority. (Think of all the US population density maps with dark spots representing cities.)
If you use the palette =
argument, you can use a function, yes. But if you use the col =
argument which (a) is the standard argument everyone knows from plot()
and (b) avoids the non-standard non-standard evaluation, then you need 100 colors:
tinyplot(lat ~ long | depth, data = quakes, pch = 19, col = hcl.colors(20))
Error: `col` must be of length 1 or 100.
I think this is confusing (unnecessarily so). For categorical palettes, the requirement to exactly match the length makes sense. But for continuous palettes, I think we should allow any length and then use colorRampPalette()
to suitably interpolate it.
Feedback regarding the open to-do points:
hcl.colors
palette.Although I personally was very disappointed by the results of that poll, the people have spoken. Dark colors should represent the high end of the scale. sigh
@zeileis @vincentarelbundock Okay... I believe that all of the outstanding issues should now be addressed. I ended up going with the adjusted viridis palette as the default after all, and any other changes should be in line with your suggestions. (See the updated examples right the top of the thread for some illustrations.)
Please kick the tires once more to check that you're happy. Assuming that everything looks good to you, please feel free to squash and merge.
Thanks, Grant, for the thorough update! I played around with it and noticed that my recommendation of reversing the scale has led to some inconsistencies. Sorry about that!
First, I noticed that the case of 100 colors is handled differently from other settings:
tinyplot(lat ~ long | depth, data = quakes, pch = 19, col = hcl.colors(99, "ag_Sunset")) ## low = light
tinyplot(lat ~ long | depth, data = quakes, pch = 19, col = hcl.colors(100, "ag_Sunset")) ## low = dark
tinyplot(lat ~ long | depth, data = quakes, pch = 19, col = hcl.colors(101, "ag_Sunset")) ## low = light
I guess that this might be due to reversing the order in two different places in the code?
Also, the default ordering is different for ordered factors vs. numeric variables, e.g.,
tinyplot(mpg ~ wt | carb, data = mtcars, pch = 19, cex = 1.5) ## low = light, bottom to top
tinyplot(mpg ~ wt | ordered(carb), data = mtcars, pch = 19, cex = 1.5) ## low = dark, top to bottom
Maybe we also want to use the same restricted viridis palette for the ordered factors?
P.S.: Given that you were already kind enough to list me with an "aut" role for the package, you never need to thank me in the NEWS. :-)
Also, the default ordering is different for ordered factors vs. numeric variables, e.g.,
tinyplot(mpg ~ wt | carb, data = mtcars, pch = 19, cex = 1.5) ## low = light, bottom to top tinyplot(mpg ~ wt | ordered(carb), data = mtcars, pch = 19, cex = 1.5) ## low = dark, top to bottom
Maybe we also want to use the same restricted viridis palette for the ordered factors?
Quick clarification/confirmation on this: We can certainly match the restricted colors for ordered factors and ensure that low values = dark. But do we want the legend to be reversed and run from bottom to top too?
I understand that it will be better for internal consistency, but we are deviating from established norms in other packages. Both ggplot2 and lattice run ordered factors from top to bottom, e.g. lattice::xyplot(mpg ~ wt, group = ordered(carb), data = mtcars, auto.key = TRUE)
Good point. So we can either be consistent within tinyplot
for numeric and ordered - or we can be consistent across packages with ggplot2
and lattice
. Then let's go with the consistency with ggplot2
and lattice
- and let's see how users like it. Given that ordered factors are typically under-used anyway, there are probably not many users affect by this.
@zeileis Thanks for confirming (and for catching these cases). Both should be fixed now:
pkgload::load_all("~/Documents/Projects/tinyplot")
#> ℹ Loading tinyplot
tinyplot(lat ~ long | depth, data = quakes, pch = 19, col = hcl.colors(100, "ag_Sunset"))
tinyplot(mpg ~ wt | ordered(carb), data = mtcars, pch = 19, cex = 1.5)
Created on 2024-02-27 with reprex v2.1.0
Is there anything we still need to do/check before merging?
Thanks for the reminder, Grant. I'm just traveling home from a conference and didn't have time, yet, to play with the code. I just noticed one last inconsistency that I wanted to mention. But as I explain below, I think that this is the best solution we can do. So I wouldn't change anything.
Compare:
tinyplot(mpg ~ wt | carb, data = mtcars, pch = 19, cex = 1.5, col = hcl.colors(6, "ag_Sunset"))
tinyplot(mpg ~ wt | ordered(carb), data = mtcars, pch = 19, cex = 1.5, col = hcl.colors(6, "ag_Sunset"))
tinyplot(mpg ~ wt | factor(carb), data = mtcars, pch = 19, cex = 1.5, col = hcl.colors(6, "ag_Sunset"))
In the numeric case we reverse the order to obtain "dark = high". But in the ordered and factor case we don't reverse the order so that dark = low. While this is somewhat inconsistent, I think this is the best we can do. I just wanted to point out why - so that you can check whether you agree with these considerations or whether you would prefer a different solution.
hcl.colors(..., rev = TRUE)
which would be rather inconvenient. So I would also leave this as it is.So then we have to decide whether the ordered case should reorder (like numeric) or not reorder (like factor). I think the latter is probably less confusing.
Hmmm. Yes, I think you're right that this is the "least bad" tradeoff that we can make here. And users can always use rev = TRUE
if they want to switch the ordering. Let's leave it as-is for now and we can potentially adjust if we get strong feedback about it.
Thanks for the reminder, Grant. I'm just traveling home from a conference and didn't have time, yet, to play with the code.
Sorry, I don't mean to be a rash, I was mostly checking in, since I realised that my last message was probably a bit ambiguous. I just pushed another small commit now, but that should be it from me unless you pick up any more issues in testing. Catching these edge cases is important, so take your time... although it would be great if we could merge this PR fairly soon, since that will clear the way for the last few things before CRAN submission ;-) I'm hoping to submit before I head out for an extended vacation around spring break.
Let me know!
Thank you for doing all the actual hard work!!
Closes #84. Closes #124. Closes #130.
Some notes:
tpar("legend.ugc")
). But it's my way to avoid something I personally find annoying about ggplot2's default behaviour, which automatically converts any numeric grouping variable into a gradient swatch, even if there are only (say) two categories.~Quick examples. [UPDATED based on bug catches and feedback in thread below.]
Created on 2024-02-22 with reprex v2.1.0