qiime2 / q2-vizard

The first choice of wizard lizards for interactive, generalized microbiome data visualization!
BSD 3-Clause "New" or "Revised" License
0 stars 9 forks source link

NEW: adds boxplot #29

Open lizgehret opened 1 week ago

lizgehret commented 1 week ago

Copying over my proposed design from basecamp for visibility:

Here's my proposed design for boxplot in q2-vizard:

This visualizer will take a numeric measure (distribution) and a categorical measure (facet_by) for constructing the box plots. Users will input the average method they'd like to use (mean or median with median as the default) as well as the whisker range (choices here, percentile, or IQR - let me know what you think makes sense for this). Any data points that fall outside of their selected whisker range will be plotted as individual points to represent outliers. Within the actual visualization, there will be a transpose signal that will allow for them to swap the orientation of the box plots (either horizontal or vertical, with horizontal being the default).

My initial thought is that the inputs will be fixed vs. allowing for a drop down to change how the box plots are grouped because this adds a lot of extra overhead in what's pre-computed (prior to the vega spec being rendered) - but it would still be possible if you think that's something that will be really helpful/commonly used by folks.

Here's a rough sketch of what this proposed design would look like: IMG_1566

nbokulich commented 1 week ago

@lizgehret this is great! I will be interested to test this once it is ready! (just tried but could not get it to work with real data; happy to share an error log if this is unexpected but I assume I am just jumping the gun 😁 )

I think that a drop-down for facet_by would be useful. Often when plotting data, users will want to look at different groupings. To give some concrete examples: with environmental/soil data like the EMP data, users might want to look at distributions at different EMPO levels (i.e., different types and subtypes of ecosystems); in human data (e.g., HMP), maybe different body sites and subtypes, or patient categories; In the PD mouse dataset or similarly structured data, they might want to look at multiple categories in the metadata like "host", "donor", and "treatment". Having all of this in a single plot would be convenient; though alternatively there could be multiple plots displayed instead of a drop-down, and the user could input a list of categorical column names to facet_by.

I think that a drop-down for the numeric measure (distribution) would be useful. E.g., if plotting alpha diversity per group, a user might want to toggle between multiple metrics (also for beta diversity, e.g., distributions of pairwise distances). Alternatively, there could be multiple plots displayed in the viz, one per measure selected (distribution could accept a list), but I like the dropdown.

I suggest making percentile the default for whiskers, but it is always a matter of taste and both are common.

lizgehret commented 5 days ago

Thanks for the feedback @nbokulich! I will definitely let you know once this is ready for a test drive - I've still yet to fill in the vega spec 😅 Here are some design updates after a discussion with @ebolyen this morning:

Now that things are a bit more fleshed out, I'm going to start working on the actual spec. Should be in a working state sometime next week!