jcmcnch / eASV-pipeline-for-515Y-926R

This is a collection of scripts for analyzing mixed 16S/18S amplicon sequences using tools such as qiime2, DADA2, deblur, and bbtools
GNU General Public License v3.0
26 stars 8 forks source link

Question about the taxonomy classifiers on the now split dada2 results #7

Open MDHDZ91 opened 1 month ago

MDHDZ91 commented 1 month ago

I'm refering to the qiime v2022.2 pipeline.

Once you get to the portion of the analysis where you have the dada2 results for each subset EUK/PROK we have to add taxonomy to the reads one more time. I noticed that for both you use the Silvadb as the classifier. Would it be more precise to use Silva on the PROK and PR2 on the EUK? Could we use the custom databases we made (Silva_PR2_EUK/Silva_PROK)as the classifier? If so any ideas on how to handle the 7vs9 taxonomy levels of Silva138 vs PR2v5?

Thank you,

jcmcnch commented 1 month ago

Hi María,

Yes, this is definitely possible. Actually PR2 is our default for 18S classification but I usually do both SILVA and PR2 in my usual 18S analysis pipeline (see runscripts folder for example workflows) in case I want to compare the taxonomy I get from both for the 18S sequences.

For 16S, our default is to use PR2 for any plastid 16S / cyanobacteria and SILVA for everything else (Bacteria + Archaea). We also sometimes do an additional BLAST-based classification step for marine cyanos (see: https://github.com/jcmcnch/ProPortal-ASV-Annotation)

But yes, certainly, you could customize this as you like. The scripts I have could just be modified to classify from different databases.

Also, what's your concern about the levels (i.e. 7 vs 9)? We just normally just ignore this although it does kind of make plotting a bit of a hassle since you can't collapse to the same level as easily...

Jesse

MDHDZ91 commented 1 month ago

Thank you Jesse,

I'm working with freshwater data (Great Lakes) so I think the EUK (silva- which would cover the plastid 16S, pr2) and Prok (Silva) fractions I have now make sense. So the classifier for the EUK portion should include both databases as you suggest so I'll look at your scripts.

Yes my concern with the different taxonomy levels is the aggregation to then plot but I'll figure something out.