torognes / vsearch

Versatile open-source tool for microbiome analysis
Other
643 stars 123 forks source link

error in swarm when clustering #478

Closed michelledesontje closed 2 years ago

michelledesontje commented 2 years ago

Hello everyone,

I am analyzing dataset obtained by metagenomic amplicon sequencing of 16S rRNA.

First I merged the pair-end reads in usearch (as I understood correct, I cannot merge many samples in vsearch), and then ran following commands:

vsearch --quiet --fastq_filter merged.fastq --fastq_maxns 0 --relabel_sha1 --eeout --fastq_qmax 42 --fastaout filter1.fasta

vsearch --quiet --derep_fulllength filter1.fasta --sizeout --fasta_width 0 --relabel_sha1 --output derep.fasta

Then for the OTU clustering, I am running swarm, but swarm -f -t 4 -w cluster_representatives.fasta derep.fasta > /dev/null

but it gives me an error with message that fasta headers must end with abundance annotations (_INT or size=INT)

Could you please help me to find a solution to this error? Thank you.

torognes commented 2 years ago

You probably need to include the "-z" option to swarm in the last command. It tells swarm that the abundances are in usearch style, which means that each header ends with something like ";size=123".

frederic-mahe commented 2 years ago

In swarm 3.0, a more explicit error message was added:

printf ">s\nA\n" | swarm > /dev/null 
...
Error: Abundance annotations not found for 1 sequences, starting on line 1.
>s
Fasta headers must end with abundance annotations (_INT or ;size=INT).
The -z option must be used if the abundance annotation is in the latter format.
Abundance annotations can be produced by dereplicating the sequences.
The header is defined as the string comprised between the ">" symbol
and the first space or the end of the line, whichever comes first
michelledesontje commented 2 years ago

Thank you very much, it worked!

I am also trying to cluster OTUs and create OTU table with vsearch. Is there a command to insert a mapping file in one of the step or another way to assign OTUs and their amount to each sample?

I am using pipeline from here https://github.com/torognes/vsearch/wiki/VSEARCH-pipeline

I did the following steps (after the dereplication):

vsearch --cluster_size derep.fasta --id 0.98 --strand plus --sizein --sizeout --fasta_width 0 --uc all.preclustered.uc --centroids all.preclustered.fasta

vsearch --uchime_denovo all.preclustered.fasta --sizein --sizeout --fasta_width 0 --nonchimeras all.denovo.nonchimeras.fasta

vsearch --cluster_size all.denovo.nonchimeras.fasta --threads 1 --id 0.97 --strand plus --sizein --sizeout --fastawidth 0 --uc all.clustered.uc --relabel OTU --centroids all.otus.fasta --otutabout all.otutab.txt

Another question: does --uchime_denovo comand only detect chimeras or also delete it?

frederic-mahe commented 2 years ago

Another question: does --uchime_denovo comand only detect chimeras or also delete it?

As stated in the manual, the --uchime_denovo command can produce up to three different output files (--chimeras | --nonchimeras | --uchimealns | --uchimeout) outputfile.

frederic-mahe commented 2 years ago

I am also trying to cluster OTUs and create OTU table with vsearch. Is there a command to insert a mapping file in one of the step or another way to assign OTUs and their amount to each sample?

I am using pipeline from https://github.com/torognes/vsearch/wiki/VSEARCH-pipeline

In the pipeline you are referring to, I think the is done by the --otutabout all.otutab.txt output option.

I am going to close that issue, as its content is getting too far from the original title. Feel free to open a new one if need be.