torognes / vsearch

Versatile open-source tool for microbiome analysis
Other
643 stars 123 forks source link

Large number of clusters with small number of sequences #490

Closed NenBarto closed 1 year ago

NenBarto commented 2 years ago

We are using vsearch to identify taxonomies in eDNA samples with 18S and COI primers, followed by MiSEQ sequencing. The issue we observe is a huge number of clusters, but each with just a small number of reads (median of 3 compared to 10 from our vertebrate amplicons). There are multiple factors - from PCR to sequencing, but any suggestions what would cause tweak in vsearch to help with this? Any help greatly appreciated.

torognes commented 2 years ago

It is difficult to help without more details. One of the main factors that determine the number and sizes of clusters is the similarity percentage specified (with the --id option) when performing the clustering. It is common to use 97% for this, but it may be lowered if there is greater variability in the target gene. Also longer reads with potentially more sequencing errors may also require a lower id threshold.