qiime2 / q2-feature-table

QIIME 2 plugin supporting operations on feature tables.
BSD 3-Clause "New" or "Revised" License
2 stars 37 forks source link

add new outputs to `summarize` #282

Closed gregcaporaso closed 11 months ago

gregcaporaso commented 1 year ago

The information in the Interactive Sample Detail and Feature Detail tabs of the visualization are useful for integration in downstream analyses. This information can currently be downloaded from the Overview tab of the .qzv (where it's almost, but not quite in our metadata format).

Now that we have the ImmutableMetadata semantic type, we should output this information in two new outputs. The first should contain the information from the Interactive Sample Detail tab, which will be sample metadata, and the second should contain the information from the Feature Detail tab. This will allow for integration of these data anywhere that metadata is accepted in QIIME 2 (e.g., this makes it easy to test for a correlation between alpha diversity and samples' total frequencies, or to view features' sequences, taxonomy, total frequency, and the number of samples they're observed in in a single visualization).

This will necessitate transitioning summarize from a Visualizer to a Pipeline.

EDIT: It also probably makes the most sense to create a new Action (or two) that generates the information described above from a FeatureTable in different ImmutableMetadata outputs. Then the Pipeline would call that Action(s). The feature-focused Action could be called tabulate-feature-frequencies (which outputs one ImmutableMetadata with columns for the total count of each feature (i.e., sum of each column, indexed on the column headers) and the total number of samples each feature is observed in (i.e., count of non-zero values in each column, indexed on the column headers). The sample-focused Action could be called tabulate-sample-frequencies, which outputs the same information but focused on the samples instead of the features. The count of non-zero values would be the same as computing observed features on the table, but I think it could be helpful to have this information before rarefaction/normalization of the table (e.g., to see if read counts obtained per sample drive separation in PCoA, even after rarefaction/normalization).

Open question: adding two new required outputs will be a breaking change to the interface here. Is that an acceptable breaking change, or should we make these optional outputs for one release cycle?

This feature request has come up before in the context of these issues: https://github.com/qiime2/q2-feature-table/issues/161 https://github.com/qiime2/q2-feature-table/issues/158