Open fedarko opened 5 years ago
Thank you @fedarko.
I think deblur reference-hit filtered
and deblur without filtering
could be good names, what do you think?
BTW changing the names within the plugin will change the name of the output artifacts; like the ones displayed here:
However, they will not change the names of the ones generated/merged for Analysis, currently looks like this:
Note that to change that we will need to modify the main qiita code vs. this plugin.
I like the suggested names, but I think they're somewhat inaccurate: both of these artifacts still have had negative filtering (e.g. of PhiX / adapter sequences) applied, right? So in a sense both of these artifacts have had "filtering" done.
Maybe something like deblur positive-and-negative-filtered
and deblur only-negative-filtered
would convey the same sort of message while being more accurate.
Re: the multiple dflt_name
artifacts, I think renaming those would also be a good idea (I know one of them has had the insertion tree filter applied, but being explicit about this in the graph would be much clearer for users IMO). I know this sort of concern has come up before on the main Qiita repo, but since this problem still remains for analyses I believe it would be worth fixing there. (Can write up an issue for this in biocore/qiita if you want.)
When I first started downloading data from Qiita, it seemed to me like
deblur final table
(all.biom
) was the table I should be using as a starting point, but from doing some digging it looks likedeblur reference hit table
(reference-hit.biom
) is the recommended table for use in typical 16S analyses. No big deal, I can rerun my analysis with reference-hit.biom instead of all.biom :)I know there are existing docs explaining the differences between these BIOMs (see references at the bottom for some of what I've found), but these are all external resources ([2] and [3] are linked from the "Help" dropdown in Qiita, but you have to dig a bit to find the info on Deblur). In my opinion, the actual Qiita user interface doesn't explain this super well. Furthermore, I think other people have had the same confusion I've had and have used
all.biom
in 16S studies; see the full thread of [5]. I can also see that this issue has been brought up before in #16, but it doesn't seem like that issue has been resolved.I believe it might be worthwhile to do some or all of the following:
deblur final table
to something likedeblur reference hit and reference non-hit table
, ordeblur non-positive-filtering table
, ordeblur all.biom ("final") table
, or something like that.This deblur artifact was positive-filtered against a reference database of 16S sequences in an attempt to remove non-16S sequences. We recommend using it for most 16S analyses.
)This deblur artifact was not positive-filtered. We recommend not using it for normal 16S analyses, but it may be useful for other marker-gene studies.
for all.biom.I am happy to discuss further/help out as needed—I think this will help people choose the correct outputs for their analyses, and alleviate confusion in general.
[1] https://github.com/biocore/deblur#input-and-output-files [2] https://qiita.ucsd.edu/static/doc/html/processingdata/index.html#deblurring (doesn't go into a lot of detail) [3] https://cmi-workshop.readthedocs.io/en/latest/qiita-16S-processing.html#the-deblur-workflow [4] https://forum.qiime2.org/t/transferring-qiita-artifacts-to-qiime2/4790 [5] https://forum.qiime2.org/t/deblur-without-16s-filter/3968/