torognes / vsearch

Versatile open-source tool for microbiome analysis
Other
656 stars 122 forks source link

uchime_ref --db can't read from stdin #506

Closed frederic-mahe closed 1 year ago

frederic-mahe commented 1 year ago

while working on issue #504, I've noticed that --db - (reading --db data from stdin) does not work and yields a Fatal error. This could be intended, for instance to avoid reading both queries and references from stdin. If not, allowing --db - would make the user interface more predictable and flexible.

## reading from process substitutions works
vsearch \
    --uchime_ref <(printf ">query\nAAGG\n") \
    --db <(printf ">parentA\nAAAA\n>parentB\nGGGG\n") \
    --quiet \
    --uchimeout /dev/null

## reading from stdin works for the queries
printf ">query\nAAGG\n" | \
    vsearch \
        --uchime_ref - \
        --db <(printf ">parentA\nAAAA\n>parentB\nGGGG\n") \
        --quiet \
        --uchimeout /dev/null

## reading from stdin does not work for the references
printf ">parentA\nAAAA\n>parentB\nGGGG\n" | \
    vsearch \
        --uchime_ref <(printf ">query\nAAGG\n") \
        --db - \
        --quiet \
        --uchimeout /dev/null # Fatal error: Unable to get status for input file (-)
frederic-mahe commented 1 year ago

--db /dev/stdin works, so the issue really is with --db -:

## reading explicitely from stdin works for the references
printf ">parentA\nAAAA\n>parentB\nGGGG\n" | \
    vsearch \
        --uchime_ref <(printf ">query\nAAGG\n") \
        --db /dev/stdin \
        --quiet \
        --uchimeout /dev/null
torognes commented 1 year ago

I think --db generally does not accept - as argument meaning read from stdin. I think this was done intentionally to avoid the use of - for multiple arguments, which would cause problems.

VSEARCH could be rewritten to accept - for many/all input file options, but only allowing it used in one at a time.

frederic-mahe commented 1 year ago

VSEARCH could be rewritten to accept - for many/all input file options, but only allowing it used in one at a time.

I don't think that's necessary.

I've added a statement in the manpage, and some tests to cover that behavior https://github.com/frederic-mahe/vsearch-tests/commit/7277b1cd74708267c80acd709660f58a75b3b93e