grantmcdermott / tinyplot

Lightweight extension of the base R graphics system
https://grantmcdermott.com/tinyplot
Apache License 2.0
226 stars 7 forks source link

Support `type = "boxplot"` #154

Closed grantmcdermott closed 2 months ago

grantmcdermott commented 3 months ago

Closes #153.

(Also address the numeric ~ factor case first mentioned in #2.)

Examples:

pkgload::load_all("~/Documents/Projects/tinyplot/")
#> ℹ Loading tinyplot

Basic boxplot

plt(weight ~ Time, data = ChickWeight, type = "boxplot")

We can pass canonical boxplot arguments too, e.g.

plt(weight ~ Time, data = ChickWeight, type = "boxplot", horizontal = TRUE, staplewex = 0, lty = 1)

Grouped boxplots are automatically and correctly adjusted.

plt(weight ~ Time | Diet, data = ChickWeight, type = "boxplot")

We can facet too (here: combined with grouping but that's optional).

plt(weight ~ Time | Diet, data = ChickWeight, type = "boxplot", facet = "by")

Finally, note that we automatically implement a boxplot type if y is numeric and x is a factor (in line with vanilla plot() and as per #2).

plt(weight ~ factor(Time), data = ChickWeight)

Created on 2024-07-05 with reprex v2.1.0

grantmcdermott commented 3 months ago

Initial code is a bit rough and ready but it seems to do the basic job.

pkgload::load_all("~/Documents/Projects/tinyplot/")
#> ℹ Loading tinyplot

plt(weight ~ Time, ChickWeight, type = "boxplot")

plt(weight ~ Time, ChickWeight, type = "boxplot", facet = ~Diet)

Grouped plotting isn't great, though. We probably need to support jittering for this to look good.

plt(weight ~ Time | Diet, ChickWeight, type = "boxplot")

Created on 2024-06-19 with reprex v2.1.0

zeileis commented 3 months ago

I'm not sure whether the overlays really ever work - except when the grouping factor has a really large effect.

Also, there is the connection to this long-standing issue regarding proper support for factors: https://github.com/grantmcdermott/tinyplot/issues/2. This has boxplots for the numeric ~ factor relationship. So I would prefer the interface

plt(weight ~ factor(Time), ChickWeight)

over the current

plt(weight ~ Time, ChickWeight, type = "boxplot")

But with increasing complexity of the current implementation I find it very difficult to see how we could split the code up.

grantmcdermott commented 3 months ago

Also, there is the connection to this long-standing issue regarding proper support for factors: https://github.com/grantmcdermott/tinyplot/issues/2. This has boxplots for the numeric ~ factor relationship.

Oh, absolutely. I'm intending for this type = boxplot support to provide a pathway for the numeric ~ factor relationship. So plt(weight ~ factor(Time), ChickWeight) would automatically convert to type = "boxplot" under the hood. Still need to firm up the code a bit first, though.

zeileis commented 3 months ago

OK, good, thanks!

I would try to contribute code for the factor ~ factor and factor ~ numeric cases. But I don't think that I would be able to cleanly disentangle the current code into a framework such as the one that I outlined in the original issue. Maybe you also have a better idea for a modular approach for this...

grantmcdermott commented 2 months ago

@zeileis this PR is now ready and should address the major outstanding concerns, including automatic coercion for numeric ~ factor plots. See the newly added examples at the top of the thread.

I'm going to merge quickly, because I'm trying to get some additional features added while I have time. But please feel free free to play around and flag any issues that you encounter.

PS. One thing we might consider: Should we add a tinyboxplot() (alias: bxplt()) function, that light wraps around this tinyplot(..., type = "boxplot") implementation? I'd might be quite nice as a bridge for those who are very used to calling boxplot().

zeileis commented 2 months ago

Thanks also for adding this. It's great that tinyplot(y ~ x) now automatically produces a boxplot for numeric y and categorical x. The grouping and faceting is also working nicely. (So maybe we can get a type = "spine" in an analogous way?) But I just wanted to comment regarding the wrapper function: I personally don't think this is needed. It's easy enough to set the type or use a factor x. I also rarely call boxplot() in base R for the same reason.