mountetna / timur

Data browser for biologists
GNU General Public License v2.0
3 stars 6 forks source link

Composable plots #215

Closed graft closed 5 years ago

graft commented 6 years ago

Currently the following plot types exist: BarGraph, BarPlot, BoxPlot, HeatmapPlot, Histogram, LinePlot, ScatterPlot, SwarmPlot, StackedBarPlot. Each of these is a more-or-less bespoke component; there is very little re-use between these components and they cannot be easily combined.

We would rather have a plot interface that allows us to build complex plots combining multiple visualization types. The above plots might be broken down into several broad classes: 1) XY plots - the Line and Scatter plots fall in this category, where each plot element represents data on two continuous axes. The histogram is a specialized variant of this kind of plot. 2) One-d plots - these plots represent either a single value or a distribution of values on a single axis. The BarPlot (or BarGraph), the box plot, and the swarm plot fall in this category. 3) Heatmaps - these are really their own beast with unique concerns

Rather than forcing a plot to make a single representation (just a line plot or just a scatter plot) we should have a more composable notion of plots. A plot is made up of:

1) Axis - these may be continuous (numerical or date/time) or discrete (categorical). Continuous axes might be log-scaled. 2) Legend - these may summarize data labels as colors or shapes (or a color ramp range) 3) Grid - These show interval sizes and indicate the plot area 4) Series - a set of elements mapped to pieces of data

Instead of having 9 different plot types, then, we can have only 2 main plot types: 2D (two continuous axes) and 1D (one continuous and one discrete axis) - each plot type can represent any number of series (a term I find infuriating because the singular is the same as the plural, but I can't find a better one).

An individual plot element component, then, no longer renders a complete plot, but is responsible for rendering a particular series on a given plot surface. Here are the plot components we might draw:

Each component expects a Series as input - the Series class would be 2D (must have x and y, might also have color) or 1D (must have y and category, might also have color). A plot defining multiple series could combine several of these representations in one view.