torognes / vsearch

Versatile open-source tool for microbiome analysis
Other
668 stars 123 forks source link

Garbage consensus sequence in all 2.13 versions #382

Closed scharch closed 5 years ago

scharch commented 5 years ago
schrammca@cogsworth$> ~/Downloads/vsearch-2.13.6-linux-x86_64/bin/vsearch -cluster_fast work/preprocess/GTGTTAGGTATATGGA/CGATCCCTCT.fa --consout cons.fa -id .97 -msaout msa.fa
vsearch v2.13.6_linux_x86_64, 31.1GB RAM, 8 cores
https://github.com/torognes/vsearch

Reading file work/preprocess/GTGTTAGGTATATGGA/CGATCCCTCT.fa 100%  
3960 nt in 8 seqs, min 495, max 495, avg 495
Masking 100% 
Sorting by length 100%
Counting k-mers 100% 
Clustering 100%  
Sorting clusters 100%
Writing clusters 100% 
Clusters: 1 Size min 8, max 8, avg 8.0
Singletons: 0, 0.0% of seqs, 0.0% of clusters
Multiple alignments 100% 

schrammca@cogsworth$> cat cons.fa                                          
>centroid=M04484:114:000000000-C9G9F:1:1111:11221:2305/1;ee=0.2323;cell=GTGTTAGGTATATGGA;umi=CGATCCCTCT;seqs=8
GGGCGGAGAGMMMMACCCAACAACCACAGCCCGCCGCAMAAMCCCCCAMAMCACAACGCCGGACCAGMMACGMMACCGMM
AMMAGCCGCGGGGGMMGMMCAMCAMCCACAMMGMCCCACGCCCAMMGCCAMCGGMGMCAMGCGMMMMCGMAMMGMAAMAA
MCCGMMMMCCGCAMGMAAMMGGGCCGMCAAMMCGGCGMMAGACACCGGCACGAMCGAGMCGAGMCAGGMMMGMCMCCAMM
CCCCCMMACAAAMMCGGMAMGMMAGMMMAGMMAGCAACMCGMMCAAGMMGAACACAAAAGAGGCACAMAAMGGCCAMMMC
AMAMGCACCAGGACCAMMMACACAGCCMCMAMCACAMCCGACAGMMAMCGMAMCAMCCGMAMAGCGMAAMACACMMCGMG
MGAGGACGMGMCMAMAMMMAMMAGAMGMMMAMCGACGGGMGMCGMMGGCMACCCCGMMMMCCAMMMAACCCGMMGCACCM
GCGCCGCAMMMAMGM

This bug appears to have been introduced with version 2.13.0; 2.12.0 gives the correct consensus. I get the same results with cluster_smallmem and for any id threshold that I've tested.

Input file: CGATCCCTCT.txt Output: msa.txt

torognes commented 5 years ago

Thank you for reporting this bug. I am working on it now.

torognes commented 5 years ago

The bug should be fixed in commit 2634cdd2d9c6af8cea5fe6abb6644f83ca45b555.

I'll probably not be able to make a new release before Monday.

torognes commented 5 years ago

The fix is available in VSEARCH 2.13.7 just released.