torognes / vsearch

Versatile open-source tool for microbiome analysis
Other
643 stars 123 forks source link

sintax: extra tab in tabbedout output when there is no match #493

Closed frederic-mahe closed 1 year ago

frederic-mahe commented 2 years ago

When there is no match, vsearch --sintax outputs a five-column line. According to the documentation, a four-column line is expected:

Column 1 contains the query label. Column 2 contains the predicted taxonomy in the same format as for the reference data, with bootstrap support indicated in parentheses after each rank. Column 3 contains the strand. If the --sintax_cutoff option is used, the predicted taxonomy will be repeated in column 4 while omitting the bootstrap values and including only the ranks with support at or above the threshold.

CUTOFF="0.9"
Q1="AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"
Q2="TGAAGAGTTTGATCATGGCTCAGATTGAACGCTGGCGGCAGGCCT"

# no match: four tabs (five columns) instead of three
printf ">q1\n%s\n" ${Q1} | \
    vsearch \
        --sintax - \
        --dbmask none \
        --db <(printf ">s;tax=d:d,p:p,c:c,o:o,f:f,g:g,s:s\n%s\n" ${Q2}) \
        --sintax_cutoff "${CUTOFF}" \
        --quiet \
        --tabbedout - | \
    tr -cd '\t' | \
    wc -c | \
    awk '{exit $1 == 3 ? 0 : 1}' && \
    echo "success" || \
        echo "failure"

# match: three tabs (four columns) as expected
printf ">q1\n%s\n" ${Q2} | \
    vsearch \
        --sintax - \
        --dbmask none \
        --db <(printf ">s;tax=d:d,p:p,c:c,o:o,f:f,g:g,s:s\n%s\n" ${Q2}) \
        --sintax_cutoff "${CUTOFF}" \
        --quiet \
        --tabbedout - | \
    tr -cd '\t' | \
    wc -c | \
    awk '{exit $1 == 3 ? 0 : 1}' && \
    echo "success" || \
        echo "failure"

I think the issue is here:

https://github.com/torognes/vsearch/blob/bae03fca37150b3fa4501446fdfe418f379b5143/src/sintax.cc#L200-L207

torognes commented 2 years ago

Will fix!

frederic-mahe commented 2 years ago

regression tests added to the test suite https://github.com/frederic-mahe/vsearch-tests/commit/e87f6a92291f15df70b5acf21169b813966019cc

torognes commented 1 year ago

Fixed in commit e97aa1a2bc5d581286485e3b9c912a039785bdfd.