jcmcnch / eASV-pipeline-for-515Y-926R

This is a collection of scripts for analyzing mixed 16S/18S amplicon sequences using tools such as qiime2, DADA2, deblur, and bbtools
GNU General Public License v3.0
26 stars 8 forks source link

bbsplit general query #3

Closed Anto007 closed 1 year ago

Anto007 commented 1 year ago

Hi,

Thank you so much for providing this very useful resource. I was just wondering if minratio=0.30 minid=0.30 at the BBSplit step is a bit too lenient? Wouldn't a lot of euk/prok reads still remain in the corresponding bins after splitting when this setting is used? From your experience, I wonder if you would have any suggestions on the ideal minratio & minid settings to split between fungal & algal rRNA reads (perhaps minratio=0.70 minid=0.70?). Thanks again for your kind response here.

jcmcnch commented 1 year ago

Hi Anto007,

Thanks for your inquiry. I believe I checked this before, but out of curiosity I just had a glance at the results from a Pacific-wide transect (covering a wide range of depths and latitude) and the average % reads mapped using these cutoffs was 99.66 % (lowest = 94.01, highest = 99.92). You can always check this if you use the pipeline by looking at the log files from bbsplit. For example, I just did grep -A2 "Read " logs/02-bbsplit/* to access the numbers mentioned above.

As for your question regarding splitting algae and fungi, I think as long as your databases are "clean" (i.e. no misplaced taxa), it should work well. I think I used the same parameters for splitting cyanobacteria from other bacteria for another study and it worked well so don't anticipate a problem for your work. The bbtools software "just works" in my experience so hope the same happens for you.

HTH, Jesse

Anto007 commented 1 year ago

Many thanks for your prompt response @jcmcnch I'm not sure if I expressed my query correctly or if I misunderstood your response. My initial query was not with regards to average % reads mapped but rather about prok 16S reads still being in the euk 18S bin and euk 18S reads still being in the prok 16S bin. I was wondering if the minratio=0.30 minid=0.30 is a bit too low for the clean separation of reads into exclusively euk/prok bins. In your experience, you never see some prok 16S-reads to be among the BBSplit output Euk 18S reads or vice-versa? Just being curious here and thanks again for the reassuring tip about splitting fungi from the algae rRNA reads.

jcmcnch commented 1 year ago

Ah I see, now I understand your question. No, we don't really see any mis-sorted reads, at least based on the SILVA132 classifications applied afterwards to each bin. Keep in mind though that there will be chloroplast (and possibly mitochondrial) 16S amplicons in the 16S table that are derived from Eukaryota. In the case of chloroplasts, they can be easily identified by looking for the :plas keyword in the taxonomy string in the latest version of the pipeline (qiime2-2022.2-DADA2-SILVA138.1-PR2_4.14.0). I think the mitochondrial 16S reads are considerably more poorly classified but they are definitely there as a small fraction of 16S reads as well.

Anto007 commented 1 year ago

Thanks so much again for your clarification- everything is super-clear now!