statdivlab / q2-corncob

BSD 3-Clause "New" or "Revised" License
0 stars 1 forks source link

could the FeatureData[Taxonomy] input be optional? #3

Open nbokulich opened 6 years ago

nbokulich commented 6 years ago

I have some ideas after looking over the q2-corncob tutorial here. I am very excited that corncob is finally available as a plugin!

As far as I can tell from looking at the example output, the FeatureData[Taxonomy] is really just used to label the features, and is not actually used explicitly by corncob.

That is convenient if the features have taxonomy classified, but what if they do not? E.g., what if the features are metabolites or genes? Or what if the feature table has already been collapsed by taxonomy (in which case the feature IDs are taxonomic labels and there is not an associated FeatureData[Taxonomy] artifact)?

It seems much more flexible if FeatureData[Taxonomy] is not a required input (or even isn't accepted as an input at all to make things more simple; or it could be a metadata file, in which case any type of feature metadata could be used to annotate these features in the corncob results). FeatureData[Taxonomy] artifacts are viewable as metadata, so if the output of corncob has a metadata transformer (e.g., like this), an alternative option would be to merge these results in a metadata visualization with the command qiime metadata tabulate as shown here.

I see that this would disturb the current output type (which is FeatureData[Taxonomy]). That seems like a slightly questionable output type, though; it is true that this output stores taxonomy information in its current form, but that is not the intended use of the results, which are to display the results of a statistical test. Defining a new format is not all that difficult (e.g., see this) and I would be happy to help. E.g., make a FeatureData[CorncobResults] type format.

An alternative would be to make this method into a visualizer, which would produce a QZV; the advantages are (1) you don't need to define a new type and (2) you could pair those results with useful plots and the like. The disadvantages are (1) you would need to figure out how to write a visualizer (I can help) and (2) it would make it more difficult for other developers to use corncob in pipeline actions (e.g., I can imagine writing a pipeline that produces corncob results and displays them along with volatility plots of longitudinal feature abundances — those results courtesy of q2-longitudinal). So I think keeping this as a method and writing a new format would be better, unless if you have specific plots in mind (e.g., it would be fairly trivial to just copy that visualization I linked to but display corncob results in place of importance scores)

Just my 2 cents! Let me know what you think.

paulinetrinh commented 5 years ago

@nbokulich Yes! Our original plan was to make a new semantic type for the results FeatureData[CorncobResults] and incorporate it into version 2.0 of q2-corncob (your help would be so greatly appreciated if you have any time to spare) along with adding in expanded functionality from corncob.

The visualization is interesting, but I agree with keeping this as a method. Sorry for the delayed response, thanks for this very helpful input. :)