torognes / vsearch

Versatile open-source tool for microbiome analysis
Other
656 stars 122 forks source link

derep_fulllength: handling of empty input files and streams #472

Closed frederic-mahe closed 2 years ago

frederic-mahe commented 2 years ago

This is similar to issue #366

I set up an automated pipeline to process fastq files into heavily transformed fasta files. The pipeline applies a series of in-RAM transformations before being written back to disk. One of the steps is a vsearch --derep_fulllength command, and it seems it does not handle empty input gracefully, which breaks my pipeline.

Here is a toy-example:

## long pipeline, sometimes yielding no data
printf "" | \
    vsearch \
        --derep_fulllength - \
        --quiet \
        --output dereplicated.fasta

## vsearch stops with an (off-topic?) error message:
# Fatal error: FASTQ input is only allowed with the fastx_uniques command

## dereplicated.fasta is not created
[[ -e "dereplicated.fasta" ]] && \
    echo "success" || \
        echo "fail"
rm -f dereplicated.fasta

Ideally, one would expect a warning, not a fatal error, and the creation of an empty output file.

torognes commented 2 years ago

Sorry for that one. Should be fixed in commit a2e550c1c9c115e25a54ae7268ec5e75a3efdc3d.

Empty files were detected as FASTQ files which caused this error.

torognes commented 2 years ago

Fix in version 2.21.1.