jcmcnch / eASV-pipeline-for-515Y-926R

This is a collection of scripts for analyzing mixed 16S/18S amplicon sequences using tools such as qiime2, DADA2, deblur, and bbtools
GNU General Public License v3.0
26 stars 8 forks source link

Would this pipeline also perfectly suitable for Illumina PE250bp reads? #8

Open shanexuuu opened 1 month ago

shanexuuu commented 1 month ago

Hi,

I am using this pipeline to reanalyze some data from Illumina Novaseq PE250bp.

And I have noticed that in this paper , they sequenced at Hiseq PE250bp (McNichol, J., Berube, P., Biller, S., Fuhrman, J., 2021. [Evaluating and Improving SSU rRNA PCR Primer Coverage for Bacteria, Archaea, and Eukaryotes Using Metagenomes from Global Ocean Surveys (https://journals.asm.org/doi/10.1128/mSystems.00565-21). mSystems. 6(3), e00565-2).

And in this paper, they sequenced at Miseq PE300bp (Yeh, Y.C., McNichol, J., Needham, D., Fichot, E., Berdjeb, L., Fuhrman, J., 2021. Comprehensive single-PCR 16S and 18S rRNA community analysis validated with mock communities, and estimation of sequencing bias against 18S. Environmental Microbiology. doi: 10.1111/1462-2920.15553.)

Would you think the PE250 reads also work perfectly with this pipeline?

Many thanks! Shane

jcmcnch commented 3 weeks ago

Hi Shane,

Sorry for the delay in getting back to you. To answer your question, yes, there is no reason why the pipeline shouldn't work with PE250, in fact that's the length that generally gives better overall quality in our experience vs PE300.

Note though that there are some potential issues with NovaSeq regarding the error model training caused by the changes in the way quality scores are reported vs. the old Illumina chemistry (see: https://github.com/benjjneb/dada2/issues/791). Also, there is an increased chances of index-hopping with NovaSeq vs MiSeq / the old HiSeq that we used (i.e. patterned flow cell vs bridge amplifiication see: https://www.illumina.com/techniques/sequencing/ngs-library-prep/multiplexing/index-hopping.html), so you should verify that you are using UDI barcoding approach for your libraries to avoid this happening, otherwise you will see ASVs bleeding through from sample to sample. I've actually not used NovaSeq data successfully myself due to these issues, so am not up to date on the current issues / concerns so suggest asking around and checking the relevant threads on dada2's github, qiime2 forum, etc.

Hope that helps, Jesse