Closed MaestSi closed 6 years ago
VSEARCH uses a fast, but very simple algorithm for multiple alignment (the center-star method) and subsequent consensus sequence computation. It uses the centroid sequence as a starting point and just aligns the other sequences in the cluster to that sequence. For this reason, indels in the other sequences relative to the centroid sequence will have little or no impact on the consensus sequence. This method does not work well when the sequences are rather different or when there are many indels.
To compute a good consensus sequence for these reads I would advise you to use better multiple alignment tool. VSEARCH is not designed for this purpose.
May I ask you if you know of any which enable a consensus extraction too? (Also for short reads, I would test them out with longer reads by myself). I have tried cd-hit but it seems it only allows cluster centroid extraction, so it is not useful to my purpose. Thanks
To compute a good consensus sequence for these reads I would advise you to use better multiple alignment tool. VSEARCH is not designed for this purpose.
This sounds like a good candidate for deblur, a clustering algorithm that uses MAFFT for MSA to address this very limitation of vsearch.
deblur is also available as a qiime2-plugin.
Colin
Thank you Colin, I read deblur README GitHub page, but unfortunately it seems strongly relying on known Illumina reads error-profile. Maybe I am going to give it a try in the coming days.
Dear vsearch developers, based on issue#305, I am trying to use vsearch for extracting consensus sequence from 50 reads 700bp long. When I try to align back the 50 reads provided as an input to vsearch to the output consensus sequence, I notice that there are some coordinates where almost all reads carry an insertion of the same nucleotide. Therefore, this means that, erroneously, these "insertion" nucleotides have not been included in the consensus sequence. Why does this happen? Should I tune parameters in a particular way? Thanks.