Closed Anto007 closed 2 years ago
Hi Anto007,
Thanks for your inquiry. I believe I checked this before, but out of curiosity I just had a glance at the results from a Pacific-wide transect (covering a wide range of depths and latitude) and the average % reads mapped using these cutoffs was 99.66 % (lowest = 94.01, highest = 99.92). You can always check this if you use the pipeline by looking at the log files from bbsplit. For example, I just did grep -A2 "Read " logs/02-bbsplit/*
to access the numbers mentioned above.
As for your question regarding splitting algae and fungi, I think as long as your databases are "clean" (i.e. no misplaced taxa), it should work well. I think I used the same parameters for splitting cyanobacteria from other bacteria for another study and it worked well so don't anticipate a problem for your work. The bbtools
software "just works" in my experience so hope the same happens for you.
HTH, Jesse
Many thanks for your prompt response @jcmcnch I'm not sure if I expressed my query correctly or if I misunderstood your response. My initial query was not with regards to average % reads mapped but rather about prok 16S reads still being in the euk 18S bin and euk 18S reads still being in the prok 16S bin. I was wondering if the minratio=0.30 minid=0.30
is a bit too low for the clean separation of reads into exclusively euk/prok bins. In your experience, you never see some prok 16S-reads to be among the BBSplit output Euk 18S reads or vice-versa? Just being curious here and thanks again for the reassuring tip about splitting fungi from the algae rRNA reads.
Ah I see, now I understand your question. No, we don't really see any mis-sorted reads, at least based on the SILVA132 classifications applied afterwards to each bin. Keep in mind though that there will be chloroplast (and possibly mitochondrial) 16S amplicons in the 16S table that are derived from Eukaryota. In the case of chloroplasts, they can be easily identified by looking for the :plas
keyword in the taxonomy string in the latest version of the pipeline (qiime2-2022.2-DADA2-SILVA138.1-PR2_4.14.0
). I think the mitochondrial 16S reads are considerably more poorly classified but they are definitely there as a small fraction of 16S reads as well.
Thanks so much again for your clarification- everything is super-clear now!
Hi,
Thank you so much for providing this very useful resource. I was just wondering if
minratio=0.30 minid=0.30
at the BBSplit step is a bit too lenient? Wouldn't a lot of euk/prok reads still remain in the corresponding bins after splitting when this setting is used? From your experience, I wonder if you would have any suggestions on the ideal minratio & minid settings to split between fungal & algal rRNA reads (perhapsminratio=0.70 minid=0.70
?). Thanks again for your kind response here.