`type_ridge()` - Githubissues

vincentarelbundock commented 2 weeks ago

https://github.com/grantmcdermott/tinyplot/issues/71

This is pretty easy to implement (says the guy who couldn't figure it out for 3 hours).

library(tinyplot)
tinyplot(Month ~ Ozone, data = airquality, type = "ridge")

zeileis commented 2 weeks ago

This is really cool and I'm just starting to work through the examples. Quick first comment: For grid = TRUE I would have hoped to get horizontal lines matching the tick marks on the y-axis, e.g.:

tinyplot(~ Species ~ Sepal.Length, data = iris, type = "ridge", grid = TRUE)

grantmcdermott commented 2 weeks ago

A similar quick comment. I'd like to be able to do

tinyplot(Species ~ Sepal.Length | Species, data = iris, type = "ridge")

so that colors vary by the y-axis entries.

Accounting for this kind x==by or y==by logic normally requires some internal accounting, since we want to avoid splitting y (or x) by itself. But we've managed to do it in a few places. For example, R/type_spineplot.R:

https://github.com/grantmcdermott/tinyplot/blob/c4d4e2cef8fe62679891e724e05864c711f373a8/R/type_spineplot.R#L122-L132

https://github.com/grantmcdermott/tinyplot/blob/c4d4e2cef8fe62679891e724e05864c711f373a8/R/type_spineplot.R#L209-L211

(In the specific case of type_spineplot we have do some more work after this to handle custom color sequencing. But for adapting the logic to type_ridge I think that copying across the above two code chunks should suffice.)

zeileis commented 2 weeks ago

Great minds think alike, I was playing with the same thing. :nerd_face: More generally, faceting does not seem to work, yet.

Also, browsing the ggridges vignette, it would be really nice to have color gradients that help to compare the x-axis values across density curves. Are you planning to add this?

grantmcdermott commented 2 weeks ago

Also, browsing the ggridges vignette, it would be really nice to have color gradients that help to compare the x-axis values across density curves. Are you planning to add this?

This would be quite a lot of work, no? Off the top of my head, I guess it would require either looping over the sequence of x values and drawing mini polygons (similar to this), or converting the polygon to an appropriate matrix and then rasterising it.

Perhaps there's a simpler solution. But I think that gradient fill support is probably out of scope for this PR. We can revisit the idea once we manage to fix #243, since the logic would probably carry over to regular density plots too.

Edit: To clarify, I think that this would be very cool. But I worry that supporting x gradient fill will require quite a lot of additional work.

vincentarelbundock commented 2 weeks ago

I added support for facets and fixed the grid problem.

I think that any fancier col or |by support would require a complete refactor of the by_aesthetics() functions. This is probably a good idea anyway (will open a different issue).

Unfortunately, I don't have the bandwidth for this right now. I can do minor fixes on PR review, but any major change will have to wait. We can merge this close to "as-is" (perhaps with an "experimental" tag), or we can wait a few weeks (months?) until I have more time.

library(tinyplot)

dat = transform(airquality, Late = ifelse(Day > 15, "Late", "Early"))
tinyplot(Month ~ Ozone,
  facet = ~Late,
  data = dat,
  type = "ridge",
  grid = TRUE,
  col = "white",
  bg = "light blue")

grantmcdermott commented 2 weeks ago

Great, thanks @vincentarelbundock. I want to take a stab at tweaking a few things so have cloned your fork locally and will test things. I'll push any changes that look good and then we can merge. Will probably be a few days.

zeileis commented 2 weeks ago

In the meantime, I'll have a look at how hard it would be to add a type_ridge(gradient = ...) specification. I hope that this shouldn't be excessive. If you merge before, it's probably straightforward to address it in a separate PR.

zeileis commented 2 weeks ago

OK, quick proof of concept:

tinyplot-ridge

To implement this I used a fixed grid of 1000 rectangles across the full range of the x variable. In the for() loop of the draw_ridge function:

  for (i in rev(seq_along(dsplit))) {
    if (gradient) {
      gn = 1000
      gc = hcl.colors(gn)
      gx = seq(from = min(d$x), to = max(d$x), length.out = gn + 1)
      gy = with(dsplit[[i]], approx(x = x, y = ymax, xout = gx)$y)
      gm = dsplit[[i]]$ymin[1]
      gy[is.na(gy)] = gm
      rect(gx[-(gn + 1)], gm, gx[-1], (gy[-1] + gy[-(gn + 1)])/2, col = gc, border = "transparent")
    }
    with(dsplit[[i]], polygon(x, ymax, col = if (gradient) "transparent" else ibg, border = icol))
  }

For the rect() to work it is crucial that gn is large enough so that you don't realize anymore that it's rectangles.

Instead one could also use polygon() to draw multiple polygons simultaneously. This would be more flexible and could also incorporate customized breaks and fewer colors. But the preprocessing of the data would require a bit more work...

vincentarelbundock commented 2 weeks ago

This looks amazing!

zeileis commented 2 weeks ago

OK, I have now a version which uses polygon() to draw multiple shaded polygons instead of drawing 1000 rect().

For the example I posted above, the outcome looks virtually identical.
The advantage is that it is sufficient to draw fewer polygons, say 100, while still producing a seemingly continuous gradient.
Moreover, one can also draw just a few, say 10, colors and select the breaks in between the intervals.
The disadvantage is that the code is slower than the one based on 1000 rectangles.

Personally, I would still go for the more general code. What do you think?

Should I modify type_ridge correspondingly? The changes are a still clear manageable but I added an internal helper function for drawing shaded segmented polygons.

vincentarelbundock commented 2 weeks ago

Cool. I don't have a view so I'll let Grant trace the path forward.

zeileis commented 1 week ago

Grant, what do you think about this? First complete the PR without color gradients and then make a new separate PR afterwards - or integrate my proposed changes into the existing PR?

If the latter, I would also export some of the density() arguments so that one can tweak kernel/bandwidth, in particular also supporting a common bandwidth for all groups.

grantmcdermott commented 1 week ago

Grant, what do you think about this? First complete the PR without color gradients and then make a new separate PR afterwards - or integrate my proposed changes into the existing PR?

Would the latter be easier? I don't mind and still have to integrate my own changes for this PR. (I also noticed some weird behaviour when y is a factor, which we'll have to fix.) So am happy to go with the path of least resistance.

P.S. Sorry for being slow on this. I've been solo parenting the last few days and also juggling an important deadline at work..

grantmcdermott commented 1 week ago

Personally, I would still go for the more general code. What do you think?

Go for it. For posterity, I also played with some as.raster-based code last week, which I include as a proof of concept below. We obviously don't have to use this, but it does have the virtues of (a) being fast and (b) having built-in interpolation.

dens = density(Nile)
x = dens$x
y = dens$y

# How many y "bins"?
# (higher numbers mean a smoother looking density function)
nx = 1000L

# create a length(x) * ny matrix along the color gradient
m = matrix(
  rep(hcl.colors(length(x), "Viridis"), nx),
  ncol = length(x),
  byrow = TRUE
)

# Use an internal tinyplot function for rescaling/normalizing
y = tinyplot:::rescale_num(y, to = c(1, ny))
y2 = round(y)

# idea: "blank" out the matrix cells above the top edge of the distribution
# note that raster plots rowwise, so we have to do this a bit back-to-front
for (i in seq_along(y2)) m[1:(nrow(m)-y2[i]+1), i] = NA

plot(y, type = "n")
plot(as.raster(m), add = TRUE)
# lines(y2)
lines(y)

^{Created on 2024-11-17 with reprex v2.1.1}

GM: Slight edits to make this example look and read better.

zeileis commented 1 week ago

Grant, I've pushed now my relatively slow version using polygon(). If you have the time to take a look that would be great. I have added various examples to the documentation that highlight the main new arguments gradient = FALSE, breaks = NULL.

Meanwhile I'm not convinced anymore that polygon() is the best option - at least not in general. It's main advantage is that I can exactly specify certain breaks on the x-axis. This will be fast and have no "fuzz" for a small number of breaks.

However, for a large number of breaks, your raster-based idea seems to be much faster. By definition this will break things down into a regular raster grid which might be somewhat less precise than the polygon(). However, for continuous gradients drawing is much faster. Do you have any thoughts on how to separate the case with "few" and "many" breaks?

I also adapted your code so that we rescale the raster rather rescaling the density:

## compute density
d <- density(Nile)

## set up raster matrix on x-grid and 1000 y-pixels 
n <- length(d$x) - 1
r <- matrix(1:n, ncol = n, nrow = 1000, byrow = TRUE)

## fill colors by column
r[] <- hcl.colors(n)[r]

## clip raster pixels above density line
ymax <- round(1000 * (d$y - min(d$y))/(max(d$y) - min(d$y)))
ix <- lapply(1:n, function(i) if(ymax[i] < 1000) cbind(setdiff(1:1000, 1001 - 0:ymax[i]), i) else NULL)
r[do.call("rbind", ix)] <- NA

## plot density and add raster gradient
plot(d)
rasterImage(as.raster(r), min(d$x), min(d$y), max(d$x), max(d$y))
lines(d)

zeileis commented 1 week ago

OK, I couldn't go to sleep before finishing the rasterImage()-based solution. This is now the new default but you can select via type_ridge(gradient = TRUE, raster = FALSE) vs. the default raster = TRUE. More later, need to get some sleep now...

grantmcdermott commented 1 week ago

Amazing @zeileis. Get some sleep and I'll dig into this as soon as I can.

On Sun, Nov 17, 2024, 18:46 Achim Zeileis @.***> wrote:

OK, I couldn't go to sleep before finishing the rasterImage()-based solution. This is now the new default but you can select via type_ridge(gradient = TRUE, raster = FALSE) vs. the default raster = TRUE. More later, need to get some sleep now...

— Reply to this email directly, view it on GitHub https://github.com/grantmcdermott/tinyplot/pull/252#issuecomment-2481820132, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACOO73IBBDRODDQHHH6VU4L2BFIHZAVCNFSM6AAAAABRVRQT5KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIOBRHAZDAMJTGI . You are receiving this because you commented.Message ID: @.***>

zeileis commented 1 week ago

OK, some more updates. I tweaked the color gradient. By default, it uses rasterImage() now unless there are 20 intervals or fewer. In the latter case the segmented polygon() is used because it is more precise regarding the breaks and a little bit faster.

Example: On the left via raster, on the right via polygon.

tinyplot(Species ~ Sepal.Width, data = iris, type = type_ridge(gradient = TRUE))
tinyplot(Species ~ Sepal.Width, data = iris, type = type_ridge(gradient = TRUE, breaks = seq(2, 4.5, by = 0.5)))

tinyplot-ridge

If you want to play around with the two implementations, you can explicitly set raster = TRUE or `FALSE. My idea would be to get rid of that argument, though, when we are happy with the implementation. See also the FIXME remarks in the source code.

Additionally, I have implemented the option to use group-specific quantiles (at probs) rather than the same breaks across all groups. The two examples below highlight the center 50% of each density (between 25% and 75% quantile) and the entire distribution using a smooth gradient. The former uses the polygon code, the latter the raster code.

tinyplot(Species ~ Sepal.Width, data = iris, col = "white", type = type_ridge(
  gradient = hcl.colors(3, "Dark Mint")[c(2, 1, 2)], probs = c(0.25, 0.75)))
tinyplot(Species ~ Sepal.Width, data = iris, type = type_ridge(
  gradient = hcl.colors(250, "Dark Mint")[c(250:1, 1:250)], probs = 0:500/500))

tinyplot-ridge2

Finally, all density() arguments can be specified via bw, kernel, ... and tinyAxis() is used for the y-axis so that we can specify axes and yaxt. Some examples are on the manual page.

I think that this covers all features that I had in mind. Suggestions for improvement are very welcome. Also, let me know if I added something that you don't feel is so useful.

vincentarelbundock commented 1 week ago

nothing to add but just wanted to say that these last few plots look insanely cool

grantmcdermott commented 1 week ago

Fantastic, @zeileis.

I'll take a look at this properly tonight. To throw one idea into the ring, though:

This morning, I was wondering whether we could speed up the polygon approach by using vectorisation. The basic idea is to "trick" polygon into draw multiple polygons in a single go by inserting appropriate NA breaks.

Here's another proof of concept. Again, this seems to work and is quick (bonus: only requires only a few lines of code).

d = density(Nile)

xx = d$x
xx = c(rbind(xx[-length(xx)], xx[-1], xx[-1], xx[-length(xx)], NA))
xx = xx[1:(length(xx)-1)]

yy = d$y
yy = c(rbind(yy[-length(yy)], yy[-1], 0, 0, NA))
yy = yy[1:(length(yy)-1)]

plot(d, type = "n")
polygon(
  x = xx,
  y = yy,
  col = hcl.colors(length(d$x)),
  border = hcl.colors(length(d$x))
)
lines(d$x, d$y)

^{Created on 2024-11-18 with reprex v2.1.1}

zeileis commented 1 week ago

Thanks for the kind words! The examples are essentially stolen from the ggridges vignette plus a little tweaking...

Re: polygon with NAs inserted. Yes, that's what my code had been doing all along. Separate polygons would have been hopeless. But even the single segmented polygon becomes quite slow - and it can even create awkward artefacts if the segments are too narrow. Try

inyplot(Species ~ Sepal.Width, data = iris, type = type_ridge(gradient = hcl.colors(1000), raster = FALSE))

grantmcdermott commented 1 week ago

Re: polygon with NAs inserted. Yes, that's what my code had been doing all along. Separate polygons would have been hopeless. But even the single segmented polygon becomes quite slow - and it can even create awkward artefacts if the segments are too narrow.

Ah, sorry. I should have read your code to start with. Too many balls in the air at the moment...

zeileis commented 1 week ago

No worries, I know that feeling. And take your time with looking at the code - just do it when you have the capacity for it. Now that I have implemented the things that I wanted to implement, I will sleep well :sleeping:

grantmcdermott commented 1 week ago

@zeileis I took a stab at improving the polygon logic and now think that it's at point we're we can safely default to it for everything instead of rasters.

The new polygon version (which is the now default) is slightly faster than the raster equivalent for gradients and doesn't leave any artifacts either.

I can post some examples here, but I think the best thing is for you to clone and test locally. Let me know if you agree. Thanks!

zeileis commented 1 week ago

Thank you so much, most of this looks great. But we need to be more careful about dropping polygon intervals that are empty. In this case we need to make sure that the intervals remain aligned with the color palette (see the left panel below).

Another small issue is that in the case without gradient but with breaks, we should keep the default light gray shading. Currently, this is dropped (see right panel below).

set.seed(0)
d <- data.frame(y = rep(1:4, each = 100), x = c(
  rnorm(100, mean = 5, sd = 2),
  rnorm(100, mean = 2, sd = 1),
  rnorm(100, mean = 8, sd = 1),
  rnorm(50, mean = 1, sd = 0.5), rnorm(50, mean = 9, sd = 0.5)
))
tinyplot(y ~ x, data = d, type = type_ridge(bw = 0.5, gradient = TRUE, breaks = -1:6 * 2))
tinyplot(y ~ x, data = d, type = type_ridge(bw = 0.5, breaks = -1:6 * 2))

tinyplot-breaks

grantmcdermott commented 1 week ago

Thanks @zeileis. I believe that I've managed to plug those two cases now:

pkgload::load_all("~/Documents/Projects/tinyplot_vincent/")
#> ℹ Loading tinyplot

set.seed(0)
d <- data.frame(y = rep(1:4, each = 100), x = c(
  rnorm(100, mean = 5, sd = 2),
  rnorm(100, mean = 2, sd = 1),
  rnorm(100, mean = 8, sd = 1),
  rnorm(50, mean = 1, sd = 0.5), rnorm(50, mean = 9, sd = 0.5)
))
tinyplot(y ~ x, data = d, type = type_ridge(bw = 0.5, gradient = TRUE, breaks = -1:6 * 2))

tinyplot(y ~ x, data = d, type = type_ridge(bw = 0.5, breaks = -1:6 * 2))

Bonus: Replicating a fun example from the ggridges package/vignette. Note that this is a case where grid = TRUE gives misaligned horizontal lines (due do the y-axis scaling?). But we can deploy draw as a workaround. (Something to think about fixing. Maybe part of a dedicated tinytheme("ridges") theme that also does things like removing the y-axis label?)

data(lincoln_weather, package = "ggridges")

op = tpar(las = 1, mgp = c(3, 0, 0))
tinyplot(
  Month ~ `Max Temperature [F]`, data = lincoln_weather,
  type = type_ridge(gradient = "plasma", scale = 3),
  # grid = grid(nx = NA, ny = 12),
  draw = abline(h = 0:11, col = "lightgray"),
  axes = "l",
  main = "Temperatures in Lincoln NE",
  ylab = NA
)

tpar(op)

^{Created on 2024-11-22 with reprex v2.1.1}

vincentarelbundock commented 1 week ago

One mistake I made (and corrected) in type_abline() is to include arguments like col and lty in the type_*() function itself, rather than using the top level tinyplot() values.

I don't know if this is a concern here, but I'm just flagging this in case gradient could be a logical flag and we could rely on the palette top-level settings.

grantmcdermott commented 1 week ago

Still to do / fix:

[x] by isn't working consistently. E.g. tinyplot(Month ~ Temp | Late, data = airq, type = "ridge").
- [x] Special case: Support by == x. E.g. tinyplot(Species ~ Sepal.Width | Sepal.Width, data = iris, type = "ridge"). Potential simple solution is to automatically trigger type_ridges(gradient = TRUE)?
- [x] Special case: Fix by == y. E.g. tinyplot(Species ~ Sepal.Width | Species, data = iris, type = "ridge", fill = "by") kind of works, but the drawing ordering of ridges is reversed and the y-axis is wrong.
[ ] Shouldn't faceting with with frame = FALSE turn off the duplicated axes?
[ ] Fix grid alignment. Maybe as part of a dedicated tinytheme("ridge") theme?
[ ] Support flip = TRUE?
[x] Add tests

grantmcdermott commented 1 week ago

I just realised another issue: Back when we first implemented gradient legends, we agreed that low values would correspond to light colors and high values to dark colours. See https://github.com/grantmcdermott/tinyplot/pull/122#issuecomment-1953364362

What is high and what is low? This depends on the context. The folklore is that on a white background the dark colors should stand out as extreme - while on dark/black background the light colors should represent the extreme. As the factory-fresh default is a white background, dark colors should be extreme. And usually extreme means large. So our default should be a reversed hcl.colors palette.

However, we're doing the opposite here for gradient = TRUE: low x values are dark and high x values are light.

Do we just want to live with this inconsistency, or reverse the palette direction?

zeileis commented 1 week ago

This all looks great!! Some comments/thoughts:

Gradient palette specification: When we merge this with themes, then gradient = TRUE can imply using "palette.sequential". But I'm not sure whether we should use tinyplot(..., palette = ...) for this. My understanding is that palette is an alternative specification of col which just specifies the border color.
Default gradient palette: Anticipating the merge with the themes, we should probably already switch from "Viridis" to "ag_Sunset" as the default gradient palette.
Default color order: I agree that it would be more consistent to use hcl.color(..., rev = TRUE) by default so that dark typically corresponds to high values.
By handling: I agree that y ~ x | x could be a nice alias for y ~ x with gradient = TRUE. Similarly, y ~ x | y could give separate fill (and/or line?) colors for each ridge line. But I'm not sure what to do with y ~ x | z then.

grantmcdermott commented 1 week ago

But I'm not sure what to do with y ~ x | z then.

Just quickly on this topic: I have some mock-up code that yields the below result. What we should do is pick one of these cases as the default for y ~ x | z and then try to update the code to give us that automatically (i.e., without have to manually specify fill etc.).

1) Border color varies by groups. Fill remains grey for all.

tinyplot(Month ~ Temp | Late, data = airq, type = "ridge")

2) Border color varies by groups, and so does fill (with no transparency).

tinyplot(Month ~ Temp | Late, data = airq, type = type_ridge(), fill = "by")

3) Border color varies by groups, and so does fill but with alpha transparency.

tinyplot(Month ~ Temp | Late, data = airq, type = "ridge", fill = 0.7)

4) Border color is fixed (here "white" but would default to par("col"), whilst fill varies by groups.

tinyplot(Month ~ Temp | Late, data = airq, type = type_ridge(), fill = 1, col = "white")

My own order of preference is probably 3, 4, 1, 2. But interested to hear what you both think.

zeileis commented 1 week ago

I would recommend a slightly different variation of 3. Maybe you can try that with your code? The idea is to borrow the strategy for lightening colors as we do in the spineplots from https://github.com/grantmcdermott/tinyplot/pull/233#issuecomment-2408754671

For each by color apply seq_palette(by_col[i], n = 2).
Use the first color (original dark color) for the border.
Use the second color (light version) as the fill color.

grantmcdermott commented 6 days ago

Ugh, this is taking longer to iron out all of the kinks than expected.

But another thing I've just realised: In almost all of these plots, the order of the y-axis should be reversed: i.e. "early" y values should be at the top, while "later" y values should be at the bottom. In other words, the series should run from top to bottom and the newest values should be at the font (bottom).

pkgload::load_all("~/Documents/Projects/tinyplot_vincent/")
#> ℹ Loading tinyplot
tinyplot(Month ~ Temp, data = airquality, type = "ridge", main = "Order of y-axis should be reversed (5 at the top, etc.)")

vincentarelbundock commented 6 days ago

Maybe it's worth comparing to ggplot2 about this. I remember being surprised but then agreeing with the default of "small" values at the bottom, which is consistent with numeric variables.

grantmcdermott commented 6 days ago

Maybe it's worth comparing to ggplot2 about this. I remember being surprised but then agreeing with the default of "small" values at the bottom, which is consistent with numeric variables.

Unfortunately, this doesn't work with our existing infra (esp. our by looping logic). E.g. I just couldn't get the y ~ x | y special case to work correctly because the ordering was reversed.

More generally, I do think it's correct to order from old to new. I realized something was off when I replicated the Lincoln weather example from the ggridges vignette. The dataset that they bundle with the package actually defines "Month" as a factor with the levels reversed (Dec:Jan) to make the graphic work. It's a bit odd.

Will post some updates examples in a bit. Saturday night movie first ;-)

grantmcdermott commented 6 days ago

It's late, so just a quick summary.

I've pushed some changes that enable by grouping. This includes by it's own (i.e., a third variable), and for the special cases where by==y and by==x.
Making the by==y case work required reordering the y-axis marks from first (oldest) at the top to last (newest) at the bottom. Stepping back, I think this makes sense as the default anyway, as per my comments above. Also, I implemented @zeileis suggestion for lightening the fill color here.
Separately, and as also discussed above, I've reversed the gradient palette order, so that that high x values are now darker. This matches our by coloring logic (and also means that there isn't a contradiction in the legend when by==x). It's a slight bummer b/c I think that the original gradient with light values at the high end of the scale looks a bit nicer. But my overall sense is that consistency is more important.
I have not changed the default palette from viridis to agSunset, since I'd rather we handle that as a separate issue with (or after) the themes PR.

Some examples taken directly from the updated Examples in the documentation.

pkgload::load_all("~/Documents/Projects/tinyplot_vincent/")
#> ℹ Loading tinyplot
## by grouping is also supported. two special cases of interest:
# 1) by == y (color by y groups)
tinyplot(Species ~ Sepal.Width | Species, data = iris, type = "ridge")

# 2) by == x (gradient coloring along x)
tinyplot(Species ~ Sepal.Width | Sepal.Width, data = iris, type = "ridge")

# aside: pass explicit `type_ridge(col = <col>)` arg to set a common border
# color
tinyplot(Species ~ Sepal.Width | Sepal.Width, data = iris,
  type = type_ridge(col = "white"))

## gradient coloring along the x-axis can also be invoked manually without
## a legend (the following lines are all equivalent)
tinyplot(Species ~ Sepal.Width, data = iris, type = type_ridge(gradient = TRUE))

## with faceting and color gradient
airq = transform(airquality, Late = ifelse(Day > 15, "Late", "Early"))
tinyplot(Month ~ Ozone, facet = ~ Late, data = airq,
  type = type_ridge(gradient = TRUE),
  grid = TRUE, axes = "t", col = "white")

^{Created on 2024-11-23 with reprex v2.1.1}

grantmcdermott commented 6 days ago

Quick coda on the standard 'by' case (i.e., not equal to x or y). This works okay although one bummer is that we can get overlapping of distributions as per below.

tinyplot(Month ~ Temp | Late, data = airq, type = "ridge", palette = "classic")

Unfortunately, this happens because our high-level logic involves looping over the by splits and drawing groups separately on the plot. As a result, I think that this is overlapping ridges behaviour is probably unavoidable without a major rewrite of of high-level plotting logic , or some kind of special exception control flow... and I don't have the time (or energy) for either rn.

P.S. You can at least add alpha transparency to get around the overlapping issue a bit.

tinyplot(Month ~ Temp | Late, data = airq, type = "ridge", palette = "classic", alpha = 0.5)

zeileis commented 6 days ago

Grant, just very quickly before I have to start preparing :pizza:

Thanks for all the work, it's great to see this progress, even if it takes longer than expected. Devel is in the detail...
We should add an argument type_ridge(ylevels = NULL) that works analogously to the corresponding argument in type_spineplot() so that users can easily re-order the y-variable levels on the fly.
I think we will need to extend type_data in such a way that we can define in which order/grouping the by and facet groups are drawn. For scatterplot-based displays of numeric variables this is usually not so important but with categorical axes it is more likely, especially when there is overlap (ridge) or conditioning (spineplot). But this is beyond this particular PR. So maybe proceed as for the spineplots and leave it for a later timepoint.
For the standard by = z case I would like to revise my opinion. I though that we essentially just do interaction(by, y) and didn't realize that the by groups are aligned on the same y level. In that case I think that transparency is better after all. Also, I find it more appealing to have the border line just at the top but not at the bottom. I like this example from the ggridges documentation.

tinyplot-ggridges

grantmcdermott commented 5 days ago

Okay, having slept on this and feeling less immediately frustrated with the code, I'm going to take another crack at reverting to the original y-axis order (i.e., back to a typical numeric scale). It will probably require some upstream logic changes. Specifically, we'll have to reverse the upstream by split logic if type="ridge". I don't like having ad hoc modifications for specific types, but I think it should work at least. Update: This ended up being simpler than I thought. Heading out for a day trip with the family now, but will push my updates when I'm back later.

grantmcdermott commented 5 days ago

@zeileis (and @vincentarelbundock) I've actioned most of your additional suggestions. For instance, we now only draw the top border of the densities and by grouping adds automatic alpha transparency (although not in the special cases of by==x or by==y). I've also reverted the order of the y-axis and fixed a couple of edge case bugs.

pkgload::load_all("~/Documents/Projects/tinyplot_vincent/")
#> ℹ Loading tinyplot
data("Aus_athletes", package = "ggridges")
op = tpar(las = 1, mgp = c(3, 0, 0), mar = c(5, 5, 4, 2)+0.1)
tinyplot(
  sport ~ height | sex, data = Aus_athletes,
  type = type_ridge(scale = 0.95),
  palette = "tableau",
  axes = "l",
  main = "Height of Australian athletes",
  ylab = NA,
  draw = abline(h = 0:9, col = "lightgray")
)

tpar(op)

^{Created on 2024-11-24 with reprex v2.1.1}

I haven't added support for type_ridge(ylevels = NULL), but at this point I honestly have to put this PR aside now. If you feel like adding more features, please go for it. But I'm pretty happy with where it is now and would like to merge if both of you agree.

vincentarelbundock commented 5 days ago

This looks amazing.

I don't think we should be afraid to merge, even if we plan to iterate and improve in the future. This is already very close to full featured.

Great work!

grantmcdermott commented 4 days ago

Okay, let's merge this PR as-is then. We can add supplemental features later on, e.g. ylevels control, maybe rug?

grantmcdermott / tinyplot

`type_ridge()` #252