Closed NenBarto closed 1 year ago
It is difficult to help without more details. One of the main factors that determine the number and sizes of clusters is the similarity percentage specified (with the --id
option) when performing the clustering. It is common to use 97% for this, but it may be lowered if there is greater variability in the target gene. Also longer reads with potentially more sequencing errors may also require a lower id threshold.
We are using vsearch to identify taxonomies in eDNA samples with 18S and COI primers, followed by MiSEQ sequencing. The issue we observe is a huge number of clusters, but each with just a small number of reads (median of 3 compared to 10 from our vertebrate amplicons). There are multiple factors - from PCR to sequencing, but any suggestions what would cause tweak in vsearch to help with this? Any help greatly appreciated.