Closed MaestSi closed 6 years ago
Hello @MaestSi
Strange! Can you post the full command(s) you ran that produced the duplicate consensus sequence? Can you also post the consensus sequence from the vsearch output (not the blast hit)?
The only explanation I have is that at the beginning of the clustering process I had 2 reads differing for more than 10%, which resulted in having 2 different clusters.
This could be the case. Looking at the vsearch output (not the blast output!) will help us solve this issue.
Thanks!
Colin
Sure, the command I ran is:
vsearch --cluster_size sample_name.fasta --clusterout_sort --strand both --id 0.9 --fasta_width 0 --sizeout --consout cons.fasta
The identical consensus sequences are:
centroid=NB500897:212:HYJNYAFXX:1:11201:1452:5991_2:N:0:TAGGCATG;seqs=53;size=53; TAAACTTCAGGGTGACCAAAAAATCAAAATAAGTGTTGGTATAAAATGGGGTCTCCTCCTCCTGTAGGGTCAAAGAAGCTAGTATTTAAATTTCGATCGGTTAATAGTATAGTAATTGCCCCTGCTAGAACAGGTAATGAAAGTAAAAGT
centroid=NB500897:212:HYJNYAFXX:1:21101:3437:16599_2:N:0:TAGGCATG;seqs=44;size=44; TAAACTTCAGGGTGACCAAAAAATCAAAATAAGTGTTGGTATAAAATGGGGTCTCCTCCTCCTGTAGGGTCAAAGAAGCTAGTATTTAAATTTCGATCGGTTAATAGTATAGTAATTGCCCCTGCTAGAACAGGTAATGAAAGTAAAAGT
Thanks.
What happens when you run this using --centroids
instead of --consout
? The consout
option changes the reads, but centroids just keeps them.
And you can sort them after like I showed in the other post!
If I use --centroids, the sequences I obtain are 87% similar, so they have correctly been put into 2 different clusters. Here are the 2 centroids:
NB500897:212:HYJNYAFXX:1:11201:1452:5991_2:N:0:TAGGCATG;size=53; TAAACTTCAGGGTGACCAAAAAATCAAAATAAGTGTTGGTATATGATGGGGTCTCCTCCTCCTGTAGGGTCAAAGAAGCTAGTATTTTAATTTCGATCGGTTATTTGTATAGTAATTGCCCCTGCGAGATCAGGTAATGGAAGTAAAAGT
NB500897:212:HYJNYAFXX:1:21101:3437:16599_2:N:0:TAGGCATG;size=44; TAAACTTCAGGGTGACAAAAAAATCAAAATAAGTGTTGGTATAAAATGGGGTCTCATCATCCTGTATTTTAAAATATTCTAGTATTTAAATTTCGATCGGTTAATAGTATAGTAATTGCACCTGCTAGAACAGGTAATGAAAGTAAAAGT
Thanks.
Ok great! I think you can safely close this issue!
Dear Vsearch developer, I am trying to see if my sample sequenced with Illumina includes one or more organisms, therefore I tried clustering with --id 0.9 and then BLASTed only consensus sequences for each cluster. I later noticed that I have at least 2 clusters with exactly identical consensus sequence. Why haven't they been merged? Should I run manually a second Vsearch run, passing as input the fasta file containing clusters consensus sequences? I tried both --cluster_fast and --cluster_size options, but this didn't change. The only explanation I have is that at the beginning of the clustering process I had 2 reads differing for more than 10%, which resulted in having 2 different clusters. What do you think about this? Thank you in advance.