qiita-spots / qp-deblur

2 stars 7 forks source link

Renaming "deblur final table"? #58

Open fedarko opened 5 years ago

fedarko commented 5 years ago

When I first started downloading data from Qiita, it seemed to me like deblur final table (all.biom) was the table I should be using as a starting point, but from doing some digging it looks like deblur reference hit table (reference-hit.biom) is the recommended table for use in typical 16S analyses. No big deal, I can rerun my analysis with reference-hit.biom instead of all.biom :)

I know there are existing docs explaining the differences between these BIOMs (see references at the bottom for some of what I've found), but these are all external resources ([2] and [3] are linked from the "Help" dropdown in Qiita, but you have to dig a bit to find the info on Deblur). In my opinion, the actual Qiita user interface doesn't explain this super well. Furthermore, I think other people have had the same confusion I've had and have used all.biom in 16S studies; see the full thread of [5]. I can also see that this issue has been brought up before in #16, but it doesn't seem like that issue has been resolved.

I believe it might be worthwhile to do some or all of the following:

  1. Rename deblur final table to something like deblur reference hit and reference non-hit table, or deblur non-positive-filtering table, or deblur all.biom ("final") table, or something like that.
    • In any case, I think that labelling this as the "final" output of deblur when it isn't actually what most users will want to use in their analyses is unclear.
  2. Add a sentence or two giving some context—and/or links to some of the references below—in the artifact details for deblur outputs (e.g. for reference-hit.biom, This deblur artifact was positive-filtered against a reference database of 16S sequences in an attempt to remove non-16S sequences. We recommend using it for most 16S analyses.)
    • It looks like #16's idea was to "add warnings", which I'd imagine being something like This deblur artifact was not positive-filtered. We recommend not using it for normal 16S analyses, but it may be useful for other marker-gene studies. for all.biom.

I am happy to discuss further/help out as needed—I think this will help people choose the correct outputs for their analyses, and alleviate confusion in general.

[1] https://github.com/biocore/deblur#input-and-output-files [2] https://qiita.ucsd.edu/static/doc/html/processingdata/index.html#deblurring (doesn't go into a lot of detail) [3] https://cmi-workshop.readthedocs.io/en/latest/qiita-16S-processing.html#the-deblur-workflow [4] https://forum.qiime2.org/t/transferring-qiita-artifacts-to-qiime2/4790 [5] https://forum.qiime2.org/t/deblur-without-16s-filter/3968/

antgonza commented 5 years ago

Thank you @fedarko.

I think deblur reference-hit filtered and deblur without filtering could be good names, what do you think?

BTW changing the names within the plugin will change the name of the output artifacts; like the ones displayed here:

Screen Shot 2019-10-03 at 8 58 06 AM

However, they will not change the names of the ones generated/merged for Analysis, currently looks like this:

Screen Shot 2019-10-08 at 11 16 25 AM

Note that to change that we will need to modify the main qiita code vs. this plugin.

fedarko commented 5 years ago

I like the suggested names, but I think they're somewhat inaccurate: both of these artifacts still have had negative filtering (e.g. of PhiX / adapter sequences) applied, right? So in a sense both of these artifacts have had "filtering" done.

Maybe something like deblur positive-and-negative-filtered and deblur only-negative-filtered would convey the same sort of message while being more accurate.

Re: the multiple dflt_name artifacts, I think renaming those would also be a good idea (I know one of them has had the insertion tree filter applied, but being explicit about this in the graph would be much clearer for users IMO). I know this sort of concern has come up before on the main Qiita repo, but since this problem still remains for analyses I believe it would be worth fixing there. (Can write up an issue for this in biocore/qiita if you want.)