torognes / vsearch

Versatile open-source tool for microbiome analysis
Other
643 stars 123 forks source link

not all samples appear in OTU table #479

Closed michelledesontje closed 1 year ago

michelledesontje commented 2 years ago

Hello everyone,

I am creating an OTU table using --cluster_size command. However, when I look at the output OTU table, half of the samples are missing there. Why could it be?

Thank you!

michelledesontje commented 2 years ago

Actually I found out that I am not loosing them but the samples replicates are considered as 1 sample. For example: sample A1, sample A2 and sample A3 in the OTU table presented as sample 1. Is it an average or a sum? and why is it happening?

frederic-mahe commented 2 years ago

hello, your bug report does not contain enough information for us to understand and replicate your issue.

Ideally, we would need an minimal example showing how your sample replicate names are merged into a single name in vsearch's output. For example, here is a fasta file made of sequences from three samples A1, A2, and A3:

printf ">s1;size=2;sample=A1;\nA\n>s2;size=1;sample=A2;\nA\n>s3;size=4;sample=A3;\nA\n"
>s1;size=2;sample=A1;
A
>s2;size=1;sample=A2;
A
>s3;size=4;sample=A3;
A

Clusterize with vsearch (input sequences are identical, so we expect a single OTU):

printf ">s1;size=2;sample=A1;\nA\n>s2;size=1;sample=A2;\nA\n>s3;size=4;sample=A3;\nA\n" |\
    vsearch \
        --cluster_size - \
        --minseqlength 1 \
        --quiet \
        --id 0.97 \
        --strand plus \
        --sizein \
        --sizeout \
        --relabel OTU_ \
        --otutabout -

vsearch produces the expected OTU table:

#OTU ID A1  A2  A3
OTU_1   2   1   4