Closed RaphaelWimmer closed 5 months ago
Thanks @RaphaelWimmer! Yes, we welcome unsolicited issues and reviews :). In fact, if you wanted to write a full review instead of just a comment, you are welcome to do that here (but no pressure).
Re: swarmplots versus gather plots, I'd be curious to hear @nickelm's thoughts. Specifically for JoVI, it is worth mentioning that novelty is not a strict requirement (see the validity section of the reviewer guidelines), but we do expect folks to make contrasts with appropriate related work.
Just a small question for clarification: gatherplots look a lot like swarmplots - but they are never mentioned in the manuscript. Is this an oversight? I would really suggest to at least reference them.
Thanks for this feedback. It's a good point. I'm revising the paper at the moment and will push changes soon, but here is a preview of my response:
Stripplots [@stripplot-seaborn] and swarmplots [@swarmplot-seaborn], which are provided in the seaborn [@Waskom2021] statistical data visualization library for Python, are categorical scatterplots closely related to the gatherplots technique. Stripplots (the name is somewhat unfortunate as it clashes with the strip plots technique for rendering univariate data as strips, yielding a plot reminiscent of a barcode) are essentially scatterplots with random jittering where at least one axis is expected to be categorical. Swarmplots improve upon stripplots by displacing points to avoid overlap; based on the somewhat Brownian appearance of marks in a swarmplot, this is done by iteratively perturbing new points until there is no overlap. In comparison, gatherplots partition the available space into stacked groups and then organizes the marks inside each group into ordered grids. This yields a richer visual language that allows for sorting marks based on color and even resize marks to fill the visual space, as well as a more compact visual representation that makes better use of available space.
Again, thanks for the feedback. I have added citations to acknowledge seaborn and these plots (I could not find any earlier mention of them).
FWIW, I don't think swarmplot in seaborn is the earliest reference for beeswarms. I've tried finding it before and the earliest I can remember finding is the beeswarm package in R, which was first published in 2010 (seaborn was first published in 2013, far as I can tell).
The swarmplot layout in seaborn is also not particularly sophisticated --- in fact I believe this gives it too much credit:
Swarmplots improve upon stripplots by displacing points to avoid overlap; based on the somewhat Brownian appearance of marks in a swarmplot, this is done by iteratively perturbing new points until there is no overlap.
I think it just sorts the points then places each point in sequence as close to the axis as it can given the placement of all prior points. That's why you get the somewhat undesirable "lean" in those plots. The beeswarm package implements better layout algorithms, like "compactswarm" (some examples later on this page); ggdist also allows the compactswarm layout (just called "swarm" there) as well as another layout called "weave" (see examples here). There is further discussion of more sophisticated beeswarm layout algorithms here.
All that said, I think the fundamental contrast @nickelm raises is correct: the only two layouts I know of that maintain stacking order are dot histograms and Wilkinson-style dotplots. All the others sacrifice stacking order, usually to try to get more accurate x positions or a more pleasant-looking layout (or both).
... as well as a more compact visual representation that makes better use of available space.
I'm not sure I follow this claim; in my experience hexagonal layouts and compactswarm layouts will be a bit more compact than dot histograms or Wilkinson dotplots (though not by much really).
I stand corrected---thanks! I did some digging and rewriting; this is what I came up with:
Specialized versions of dot plots (Wilkinson 1999) have been imbued with their own monikers. The R graphics package presents a version called a stripchart, which allows for both jittering and stacking categorical data in a one-dimensional scatterplot. Beeswarm plots (Eklund and Trimble 2021) improves on stripcharts by allowing for closely packed, non-overlapping points. The seaborn (M. L. Waskom 2021) statistical data visualization library for Python extends these chart types to two-dimensional space. Stripplots (the name is somewhat unfortunate as it clashes with the strip plots technique for rendering univariate data as strips, yielding a plot reminiscent of a barcode) are 1D or 2D versions of stripcharts with random jittering where at least one axis is expected to be categorical. Swarmplots, on the other hand, are 2D versions of beeswarm plots that place points to avoid overlap. In comparison, gatherplots partition the available space into stacked groups and then organize the marks inside each group into ordered grids. This yields a richer visual language that allows for sorting marks based on color and even resize marks to fill the visual space.
Thanks for the rapid feedback---this is really cool!
Do we feel the current text satisfies @RaphaelWimmer's concerns? I think so. Thoughts @RaphaelWimmer / @codementum?
If so, we should close this issue.
Yeah, I think the current version gives a very good overview and helps designers in choosing the right visualization. I'm more than satisfied and have learned a lot :)
Thanks!
(I'm not sure if this is the right way to raise a question as a non-reviewer; I hope this is ok.)
Just a small question for clarification: gatherplots look a lot like swarmplots - but they are never mentioned in the manuscript. Is this an oversight? I would really suggest to at least reference them.