Closed vincentarelbundock closed 4 days ago
This is really cool and I'm just starting to work through the examples. Quick first comment: For grid = TRUE
I would have hoped to get horizontal lines matching the tick marks on the y-axis, e.g.:
tinyplot(~ Species ~ Sepal.Length, data = iris, type = "ridge", grid = TRUE)
A similar quick comment. I'd like to be able to do
tinyplot(Species ~ Sepal.Length | Species, data = iris, type = "ridge")
so that colors vary by the y-axis entries.
Accounting for this kind x==by
or y==by
logic normally requires some internal accounting, since we want to avoid splitting y (or x) by itself. But we've managed to do it in a few places. For example, R/type_spineplot.R
:
(In the specific case of type_spineplot
we have do some more work after this to handle custom color sequencing. But for adapting the logic to type_ridge
I think that copying across the above two code chunks should suffice.)
Great minds think alike, I was playing with the same thing. :nerd_face: More generally, faceting does not seem to work, yet.
Also, browsing the ggridges vignette, it would be really nice to have color gradients that help to compare the x-axis values across density curves. Are you planning to add this?
Also, browsing the ggridges vignette, it would be really nice to have color gradients that help to compare the x-axis values across density curves. Are you planning to add this?
This would be quite a lot of work, no? Off the top of my head, I guess it would require either looping over the sequence of x values and drawing mini polygons (similar to this), or converting the polygon to an appropriate matrix and then rasterising it.
Perhaps there's a simpler solution. But I think that gradient fill support is probably out of scope for this PR. We can revisit the idea once we manage to fix #243, since the logic would probably carry over to regular density plots too.
Edit: To clarify, I think that this would be very cool. But I worry that supporting x gradient fill will require quite a lot of additional work.
I added support for facets and fixed the grid problem.
I think that any fancier col
or |by
support would require a complete refactor of the by_aesthetics()
functions. This is probably a good idea anyway (will open a different issue).
Unfortunately, I don't have the bandwidth for this right now. I can do minor fixes on PR review, but any major change will have to wait. We can merge this close to "as-is" (perhaps with an "experimental" tag), or we can wait a few weeks (months?) until I have more time.
library(tinyplot)
dat = transform(airquality, Late = ifelse(Day > 15, "Late", "Early"))
tinyplot(Month ~ Ozone,
facet = ~Late,
data = dat,
type = "ridge",
grid = TRUE,
col = "white",
bg = "light blue")
Great, thanks @vincentarelbundock. I want to take a stab at tweaking a few things so have cloned your fork locally and will test things. I'll push any changes that look good and then we can merge. Will probably be a few days.
In the meantime, I'll have a look at how hard it would be to add a type_ridge(gradient = ...)
specification. I hope that this shouldn't be excessive. If you merge before, it's probably straightforward to address it in a separate PR.
OK, quick proof of concept:
To implement this I used a fixed grid of 1000 rectangles across the full range of the x variable. In the for()
loop of the draw_ridge
function:
for (i in rev(seq_along(dsplit))) {
if (gradient) {
gn = 1000
gc = hcl.colors(gn)
gx = seq(from = min(d$x), to = max(d$x), length.out = gn + 1)
gy = with(dsplit[[i]], approx(x = x, y = ymax, xout = gx)$y)
gm = dsplit[[i]]$ymin[1]
gy[is.na(gy)] = gm
rect(gx[-(gn + 1)], gm, gx[-1], (gy[-1] + gy[-(gn + 1)])/2, col = gc, border = "transparent")
}
with(dsplit[[i]], polygon(x, ymax, col = if (gradient) "transparent" else ibg, border = icol))
}
For the rect()
to work it is crucial that gn
is large enough so that you don't realize anymore that it's rectangles.
Instead one could also use polygon()
to draw multiple polygons simultaneously. This would be more flexible and could also incorporate customized breaks and fewer colors. But the preprocessing of the data would require a bit more work...
This looks amazing!
OK, I have now a version which uses polygon()
to draw multiple shaded polygons instead of drawing 1000 rect()
.
Personally, I would still go for the more general code. What do you think?
Should I modify type_ridge
correspondingly? The changes are a still clear manageable but I added an internal helper function for drawing shaded segmented polygons.
Cool. I don't have a view so I'll let Grant trace the path forward.
Grant, what do you think about this? First complete the PR without color gradients and then make a new separate PR afterwards - or integrate my proposed changes into the existing PR?
If the latter, I would also export some of the density()
arguments so that one can tweak kernel/bandwidth, in particular also supporting a common bandwidth for all groups.
Grant, what do you think about this? First complete the PR without color gradients and then make a new separate PR afterwards - or integrate my proposed changes into the existing PR?
Would the latter be easier? I don't mind and still have to integrate my own changes for this PR. (I also noticed some weird behaviour when y
is a factor, which we'll have to fix.) So am happy to go with the path of least resistance.
P.S. Sorry for being slow on this. I've been solo parenting the last few days and also juggling an important deadline at work..
Personally, I would still go for the more general code. What do you think?
Go for it. For posterity, I also played with some as.raster
-based code last week, which I include as a proof of concept below. We obviously don't have to use this, but it does have the virtues of (a) being fast and (b) having built-in interpolation.
dens = density(Nile)
x = dens$x
y = dens$y
# How many y "bins"?
# (higher numbers mean a smoother looking density function)
nx = 1000L
# create a length(x) * ny matrix along the color gradient
m = matrix(
rep(hcl.colors(length(x), "Viridis"), nx),
ncol = length(x),
byrow = TRUE
)
# Use an internal tinyplot function for rescaling/normalizing
y = tinyplot:::rescale_num(y, to = c(1, ny))
y2 = round(y)
# idea: "blank" out the matrix cells above the top edge of the distribution
# note that raster plots rowwise, so we have to do this a bit back-to-front
for (i in seq_along(y2)) m[1:(nrow(m)-y2[i]+1), i] = NA
plot(y, type = "n")
plot(as.raster(m), add = TRUE)
# lines(y2)
lines(y)
Created on 2024-11-17 with reprex v2.1.1
GM: Slight edits to make this example look and read better.
Grant, I've pushed now my relatively slow version using polygon()
. If you have the time to take a look that would be great. I have added various examples to the documentation that highlight the main new arguments gradient = FALSE, breaks = NULL
.
Meanwhile I'm not convinced anymore that polygon()
is the best option - at least not in general. It's main advantage is that I can exactly specify certain breaks
on the x-axis. This will be fast and have no "fuzz" for a small number of breaks.
However, for a large number of breaks, your raster-based idea seems to be much faster. By definition this will break things down into a regular raster grid which might be somewhat less precise than the polygon()
. However, for continuous gradients drawing is much faster. Do you have any thoughts on how to separate the case with "few" and "many" breaks
?
I also adapted your code so that we rescale the raster rather rescaling the density:
## compute density
d <- density(Nile)
## set up raster matrix on x-grid and 1000 y-pixels
n <- length(d$x) - 1
r <- matrix(1:n, ncol = n, nrow = 1000, byrow = TRUE)
## fill colors by column
r[] <- hcl.colors(n)[r]
## clip raster pixels above density line
ymax <- round(1000 * (d$y - min(d$y))/(max(d$y) - min(d$y)))
ix <- lapply(1:n, function(i) if(ymax[i] < 1000) cbind(setdiff(1:1000, 1001 - 0:ymax[i]), i) else NULL)
r[do.call("rbind", ix)] <- NA
## plot density and add raster gradient
plot(d)
rasterImage(as.raster(r), min(d$x), min(d$y), max(d$x), max(d$y))
lines(d)
OK, I couldn't go to sleep before finishing the rasterImage()
-based solution. This is now the new default but you can select via type_ridge(gradient = TRUE, raster = FALSE)
vs. the default raster = TRUE
. More later, need to get some sleep now...
Amazing @zeileis. Get some sleep and I'll dig into this as soon as I can.
On Sun, Nov 17, 2024, 18:46 Achim Zeileis @.***> wrote:
OK, I couldn't go to sleep before finishing the rasterImage()-based solution. This is now the new default but you can select via type_ridge(gradient = TRUE, raster = FALSE) vs. the default raster = TRUE. More later, need to get some sleep now...
— Reply to this email directly, view it on GitHub https://github.com/grantmcdermott/tinyplot/pull/252#issuecomment-2481820132, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACOO73IBBDRODDQHHH6VU4L2BFIHZAVCNFSM6AAAAABRVRQT5KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIOBRHAZDAMJTGI . You are receiving this because you commented.Message ID: @.***>
OK, some more updates. I tweaked the color gradient. By default, it uses rasterImage()
now unless there are 20 intervals or fewer. In the latter case the segmented polygon()
is used because it is more precise regarding the breaks
and a little bit faster.
Example: On the left via raster, on the right via polygon.
tinyplot(Species ~ Sepal.Width, data = iris, type = type_ridge(gradient = TRUE))
tinyplot(Species ~ Sepal.Width, data = iris, type = type_ridge(gradient = TRUE, breaks = seq(2, 4.5, by = 0.5)))
If you want to play around with the two implementations, you can explicitly set raster = TRUE
or `FALSE. My idea would be to get rid of that argument, though, when we are happy with the implementation. See also the FIXME remarks in the source code.
Additionally, I have implemented the option to use group-specific quantiles (at probs
) rather than the same breaks
across all groups. The two examples below highlight the center 50% of each density (between 25% and 75% quantile) and the entire distribution using a smooth gradient. The former uses the polygon code, the latter the raster code.
tinyplot(Species ~ Sepal.Width, data = iris, col = "white", type = type_ridge(
gradient = hcl.colors(3, "Dark Mint")[c(2, 1, 2)], probs = c(0.25, 0.75)))
tinyplot(Species ~ Sepal.Width, data = iris, type = type_ridge(
gradient = hcl.colors(250, "Dark Mint")[c(250:1, 1:250)], probs = 0:500/500))
Finally, all density()
arguments can be specified via bw, kernel, ...
and tinyAxis()
is used for the y-axis so that we can specify axes
and yaxt
. Some examples are on the manual page.
I think that this covers all features that I had in mind. Suggestions for improvement are very welcome. Also, let me know if I added something that you don't feel is so useful.
nothing to add but just wanted to say that these last few plots look insanely cool
Fantastic, @zeileis.
I'll take a look at this properly tonight. To throw one idea into the ring, though:
This morning, I was wondering whether we could speed up the polygon approach by using vectorisation. The basic idea is to "trick" polygon
into draw multiple polygons in a single go by inserting appropriate NA breaks.
Here's another proof of concept. Again, this seems to work and is quick (bonus: only requires only a few lines of code).
d = density(Nile)
xx = d$x
xx = c(rbind(xx[-length(xx)], xx[-1], xx[-1], xx[-length(xx)], NA))
xx = xx[1:(length(xx)-1)]
yy = d$y
yy = c(rbind(yy[-length(yy)], yy[-1], 0, 0, NA))
yy = yy[1:(length(yy)-1)]
plot(d, type = "n")
polygon(
x = xx,
y = yy,
col = hcl.colors(length(d$x)),
border = hcl.colors(length(d$x))
)
lines(d$x, d$y)
Created on 2024-11-18 with reprex v2.1.1
Thanks for the kind words! The examples are essentially stolen from the ggridges
vignette plus a little tweaking...
Re: polygon with NAs inserted. Yes, that's what my code had been doing all along. Separate polygons would have been hopeless. But even the single segmented polygon becomes quite slow - and it can even create awkward artefacts if the segments are too narrow. Try
inyplot(Species ~ Sepal.Width, data = iris, type = type_ridge(gradient = hcl.colors(1000), raster = FALSE))
Re: polygon with NAs inserted. Yes, that's what my code had been doing all along. Separate polygons would have been hopeless. But even the single segmented polygon becomes quite slow - and it can even create awkward artefacts if the segments are too narrow.
Ah, sorry. I should have read your code to start with. Too many balls in the air at the moment...
No worries, I know that feeling. And take your time with looking at the code - just do it when you have the capacity for it. Now that I have implemented the things that I wanted to implement, I will sleep well :sleeping:
@zeileis I took a stab at improving the polygon logic and now think that it's at point we're we can safely default to it for everything instead of rasters.
The new polygon version (which is the now default) is slightly faster than the raster equivalent for gradients and doesn't leave any artifacts either.
I can post some examples here, but I think the best thing is for you to clone and test locally. Let me know if you agree. Thanks!
Thank you so much, most of this looks great. But we need to be more careful about dropping polygon intervals that are empty. In this case we need to make sure that the intervals remain aligned with the color palette (see the left panel below).
Another small issue is that in the case without gradient but with breaks, we should keep the default light gray shading. Currently, this is dropped (see right panel below).
set.seed(0)
d <- data.frame(y = rep(1:4, each = 100), x = c(
rnorm(100, mean = 5, sd = 2),
rnorm(100, mean = 2, sd = 1),
rnorm(100, mean = 8, sd = 1),
rnorm(50, mean = 1, sd = 0.5), rnorm(50, mean = 9, sd = 0.5)
))
tinyplot(y ~ x, data = d, type = type_ridge(bw = 0.5, gradient = TRUE, breaks = -1:6 * 2))
tinyplot(y ~ x, data = d, type = type_ridge(bw = 0.5, breaks = -1:6 * 2))
Thanks @zeileis. I believe that I've managed to plug those two cases now:
pkgload::load_all("~/Documents/Projects/tinyplot_vincent/")
#> ℹ Loading tinyplot
set.seed(0)
d <- data.frame(y = rep(1:4, each = 100), x = c(
rnorm(100, mean = 5, sd = 2),
rnorm(100, mean = 2, sd = 1),
rnorm(100, mean = 8, sd = 1),
rnorm(50, mean = 1, sd = 0.5), rnorm(50, mean = 9, sd = 0.5)
))
tinyplot(y ~ x, data = d, type = type_ridge(bw = 0.5, gradient = TRUE, breaks = -1:6 * 2))
tinyplot(y ~ x, data = d, type = type_ridge(bw = 0.5, breaks = -1:6 * 2))
Bonus: Replicating a fun example from the ggridges package/vignette. Note that this is a case where grid = TRUE
gives misaligned horizontal lines (due do the y-axis scaling?). But we can deploy draw
as a workaround. (Something to think about fixing. Maybe part of a dedicated tinytheme("ridges")
theme that also does things like removing the y-axis label?)
data(lincoln_weather, package = "ggridges")
op = tpar(las = 1, mgp = c(3, 0, 0))
tinyplot(
Month ~ `Max Temperature [F]`, data = lincoln_weather,
type = type_ridge(gradient = "plasma", scale = 3),
# grid = grid(nx = NA, ny = 12),
draw = abline(h = 0:11, col = "lightgray"),
axes = "l",
main = "Temperatures in Lincoln NE",
ylab = NA
)
tpar(op)
Created on 2024-11-22 with reprex v2.1.1
One mistake I made (and corrected) in type_abline()
is to include arguments like col
and lty
in the type_*()
function itself, rather than using the top level tinyplot()
values.
I don't know if this is a concern here, but I'm just flagging this in case gradient
could be a logical flag and we could rely on the palette
top-level settings.
Still to do / fix:
[x] by
isn't working consistently. E.g. tinyplot(Month ~ Temp | Late, data = airq, type = "ridge")
.
by
== x
. E.g. tinyplot(Species ~ Sepal.Width | Sepal.Width, data = iris, type = "ridge")
. Potential simple solution is to automatically trigger type_ridges(gradient = TRUE)
?by
== y
. E.g. tinyplot(Species ~ Sepal.Width | Species, data = iris, type = "ridge", fill = "by")
kind of works, but the drawing ordering of ridges is reversed and the y-axis is wrong.frame = FALSE
turn off the duplicated axes? grid
alignment. Maybe as part of a dedicated tinytheme("ridge")
theme?flip = TRUE
?I just realised another issue: Back when we first implemented gradient legends, we agreed that low values would correspond to light colors and high values to dark colours. See https://github.com/grantmcdermott/tinyplot/pull/122#issuecomment-1953364362
What is high and what is low? This depends on the context. The folklore is that on a white background the dark colors should stand out as extreme - while on dark/black background the light colors should represent the extreme. As the factory-fresh default is a white background, dark colors should be extreme. And usually extreme means large. So our default should be a reversed hcl.colors palette.
However, we're doing the opposite here for gradient = TRUE
: low x values are dark and high x values are light.
Do we just want to live with this inconsistency, or reverse the palette direction?
This all looks great!! Some comments/thoughts:
gradient = TRUE
can imply using "palette.sequential"
. But I'm not sure whether we should use tinyplot(..., palette = ...)
for this. My understanding is that palette
is an alternative specification of col
which just specifies the border color."Viridis"
to "ag_Sunset"
as the default gradient palette.hcl.color(..., rev = TRUE)
by default so that dark typically corresponds to high values.y ~ x | x
could be a nice alias for y ~ x
with gradient = TRUE
. Similarly, y ~ x | y
could give separate fill (and/or line?) colors for each ridge line. But I'm not sure what to do with y ~ x | z
then.
- But I'm not sure what to do with
y ~ x | z
then.
Just quickly on this topic: I have some mock-up code that yields the below result. What we should do is pick one of these cases as the default for y ~ x | z
and then try to update the code to give us that automatically (i.e., without have to manually specify fill
etc.).
1) Border color varies by groups. Fill remains grey for all.
tinyplot(Month ~ Temp | Late, data = airq, type = "ridge")
2) Border color varies by groups, and so does fill (with no transparency).
tinyplot(Month ~ Temp | Late, data = airq, type = type_ridge(), fill = "by")
3) Border color varies by groups, and so does fill but with alpha transparency.
tinyplot(Month ~ Temp | Late, data = airq, type = "ridge", fill = 0.7)
4) Border color is fixed (here "white" but would default to par("col")
, whilst fill varies by groups.
tinyplot(Month ~ Temp | Late, data = airq, type = type_ridge(), fill = 1, col = "white")
My own order of preference is probably 3, 4, 1, 2. But interested to hear what you both think.
I would recommend a slightly different variation of 3. Maybe you can try that with your code? The idea is to borrow the strategy for lightening colors as we do in the spineplots from https://github.com/grantmcdermott/tinyplot/pull/233#issuecomment-2408754671
by
color apply seq_palette(by_col[i], n = 2)
.Ugh, this is taking longer to iron out all of the kinks than expected.
But another thing I've just realised: In almost all of these plots, the order of the y-axis should be reversed: i.e. "early" y values should be at the top, while "later" y values should be at the bottom. In other words, the series should run from top to bottom and the newest values should be at the font (bottom).
pkgload::load_all("~/Documents/Projects/tinyplot_vincent/")
#> ℹ Loading tinyplot
tinyplot(Month ~ Temp, data = airquality, type = "ridge", main = "Order of y-axis should be reversed (5 at the top, etc.)")
Maybe it's worth comparing to ggplot2 about this. I remember being surprised but then agreeing with the default of "small" values at the bottom, which is consistent with numeric variables.
Maybe it's worth comparing to ggplot2 about this. I remember being surprised but then agreeing with the default of "small" values at the bottom, which is consistent with numeric variables.
Unfortunately, this doesn't work with our existing infra (esp. our by
looping logic). E.g. I just couldn't get the y ~ x | y
special case to work correctly because the ordering was reversed.
More generally, I do think it's correct to order from old to new. I realized something was off when I replicated the Lincoln weather example from the ggridges vignette. The dataset that they bundle with the package actually defines "Month" as a factor with the levels reversed (Dec:Jan) to make the graphic work. It's a bit odd.
Will post some updates examples in a bit. Saturday night movie first ;-)
It's late, so just a quick summary.
by
grouping. This includes by
it's own (i.e., a third variable), and for the special cases where by==y
and by==x
.by==y
case work required reordering the y-axis marks from first (oldest) at the top to last (newest) at the bottom. Stepping back, I think this makes sense as the default anyway, as per my comments above. Also, I implemented @zeileis suggestion for lightening the fill color here.x
values are now darker. This matches our by
coloring logic (and also means that there isn't a contradiction in the legend when by==x
). It's a slight bummer b/c I think that the original gradient with light values at the high end of the scale looks a bit nicer. But my overall sense is that consistency is more important.Some examples taken directly from the updated Examples in the documentation.
pkgload::load_all("~/Documents/Projects/tinyplot_vincent/")
#> ℹ Loading tinyplot
## by grouping is also supported. two special cases of interest:
# 1) by == y (color by y groups)
tinyplot(Species ~ Sepal.Width | Species, data = iris, type = "ridge")
# 2) by == x (gradient coloring along x)
tinyplot(Species ~ Sepal.Width | Sepal.Width, data = iris, type = "ridge")
# aside: pass explicit `type_ridge(col = <col>)` arg to set a common border
# color
tinyplot(Species ~ Sepal.Width | Sepal.Width, data = iris,
type = type_ridge(col = "white"))
## gradient coloring along the x-axis can also be invoked manually without
## a legend (the following lines are all equivalent)
tinyplot(Species ~ Sepal.Width, data = iris, type = type_ridge(gradient = TRUE))
## with faceting and color gradient
airq = transform(airquality, Late = ifelse(Day > 15, "Late", "Early"))
tinyplot(Month ~ Ozone, facet = ~ Late, data = airq,
type = type_ridge(gradient = TRUE),
grid = TRUE, axes = "t", col = "white")
Created on 2024-11-23 with reprex v2.1.1
Quick coda on the standard 'by' case (i.e., not equal to x or y). This works okay although one bummer is that we can get overlapping of distributions as per below.
tinyplot(Month ~ Temp | Late, data = airq, type = "ridge", palette = "classic")
Unfortunately, this happens because our high-level logic involves looping over the by
splits and drawing groups separately on the plot. As a result, I think that this is overlapping ridges behaviour is probably unavoidable without a major rewrite of of high-level plotting logic , or some kind of special exception control flow... and I don't have the time (or energy) for either rn.
P.S. You can at least add alpha transparency to get around the overlapping issue a bit.
tinyplot(Month ~ Temp | Late, data = airq, type = "ridge", palette = "classic", alpha = 0.5)
Grant, just very quickly before I have to start preparing :pizza:
type_ridge(ylevels = NULL)
that works analogously to the corresponding argument in type_spineplot()
so that users can easily re-order the y-variable levels on the fly.type_data
in such a way that we can define in which order/grouping the by
and facet
groups are drawn. For scatterplot-based displays of numeric variables this is usually not so important but with categorical axes it is more likely, especially when there is overlap (ridge) or conditioning (spineplot). But this is beyond this particular PR. So maybe proceed as for the spineplots and leave it for a later timepoint.by = z
case I would like to revise my opinion. I though that we essentially just do interaction(by, y)
and didn't realize that the by
groups are aligned on the same y
level. In that case I think that transparency is better after all. Also, I find it more appealing to have the border line just at the top but not at the bottom. I like this example from the ggridges
documentation.Okay, having slept on this and feeling less immediately frustrated with the code, I'm going to take another crack at reverting to the original y-axis order (i.e., back to a typical numeric scale). It will probably require some upstream logic changes. Specifically, we'll have to reverse the upstream Update: This ended up being simpler than I thought. Heading out for a day trip with the family now, but will push my updates when I'm back later.by
split logic if type="ridge"
. I don't like having ad hoc modifications for specific types, but I think it should work at least.
@zeileis (and @vincentarelbundock) I've actioned most of your additional suggestions. For instance, we now only draw the top border of the densities and by
grouping adds automatic alpha transparency (although not in the special cases of by==x
or by==y
). I've also reverted the order of the y-axis and fixed a couple of edge case bugs.
pkgload::load_all("~/Documents/Projects/tinyplot_vincent/")
#> ℹ Loading tinyplot
data("Aus_athletes", package = "ggridges")
op = tpar(las = 1, mgp = c(3, 0, 0), mar = c(5, 5, 4, 2)+0.1)
tinyplot(
sport ~ height | sex, data = Aus_athletes,
type = type_ridge(scale = 0.95),
palette = "tableau",
axes = "l",
main = "Height of Australian athletes",
ylab = NA,
draw = abline(h = 0:9, col = "lightgray")
)
tpar(op)
Created on 2024-11-24 with reprex v2.1.1
I haven't added support for type_ridge(ylevels = NULL)
, but at this point I honestly have to put this PR aside now. If you feel like adding more features, please go for it. But I'm pretty happy with where it is now and would like to merge if both of you agree.
This looks amazing.
I don't think we should be afraid to merge, even if we plan to iterate and improve in the future. This is already very close to full featured.
Great work!
Okay, let's merge this PR as-is then. We can add supplemental features later on, e.g. ylevels
control, maybe rug
?
https://github.com/grantmcdermott/tinyplot/issues/71
This is pretty easy to implement (says the guy who couldn't figure it out for 3 hours).