Closed BrandenOlson closed 5 years ago
I think that this is great, of course!
I remind you that we'd like to be able to leave the door open to displaying these sorts of summary plots as part of olmsted. Although that's not on the near-term roadmap, could you check in with @eharkins to think about intermediate data exports that could be consumed by him? Don't let this slow you down though.
The next step will be to add support for ecdfs and frequency polygons (using ggplot2
). I might cut it off there as there are endless possibilities for plotting, unless anyone has specific requests.
A next step for sumrep is a full plotting feature which takes a dataset and displays plots of each possible summary distribution. The current idea is to restrict to univariate distributions, but it might be possible to include bivariate summaries in a nice way later on.
Since there are at least a dozen summaries to plot, and each summary has its own considerations (discrete vs continuous, range, etc.), I propose to create a separate plotting function for each statistic under consideration. This is similar to how there is a comparison function for each statistic. So, for example, the pairwise distance summary statistic will have three corresponding sumrep functions:
getPairwiseDistanceDistribution
,comparePairwiseDistanceDistributions
, andplotPairwiseDistanceDistribution
. This will allow custom x and y labels, custom ranges for support, histograms vs densities, specific legends, etc.This will also pave the way to a straightforward "master plotting" function which just iterates over each of these
plot...
functions, adds the plot to a list, and displays them all in a grid.It would be nice to allow for multiple datasets to be plotted within each function as a future addition. This framework should make that relatively painless.
@matsen - let me know your thoughts when you can. I was hoping to implement this by the next software WG meeting, or at least finish a basic proof of concept.