lh3 / seqtk

Toolkit for processing sequences in FASTA/Q formats
MIT License
1.35k stars 311 forks source link

Unusual behavior due to asterix shell expansion #146

Closed deprekate closed 4 years ago

deprekate commented 4 years ago

This is more of unexpected behavior than a bug of seqtk itself. When a file list expansion is given to seqtk, only the first file is processed.

$ printf ">seq1\nACTG\n" > file1
$ printf ">seq2\nACTG\n" > file2
$ printf ">seq3\nACTG\n" > file3
$ ./seqtk seq -A file*
>seq1
ACTG
$ 

If the output is instead piped to seqtk through stdin, though it works fine.

$ cat file* | seqtk seq -A
>seq1
ACTG
>seq2
ACTG
>seq3
ACTG
yzhernand commented 4 years ago

I think this is just a misunderstanding of the documentation, which states that only one file is expected:

$ seqtk seq

Usage:   seqtk seq [options] <in.fq>|<in.fa>

Usually, multiple input files would be indicated with something like [in.fa]... in a manual.

If you know how to read C code, you can see that where seqtk seq processes the input, it only grabs the first non-option argument (or opens stdin) and gzopen()s (or gzdopen()s) that: https://github.com/lh3/seqtk/blob/ca4785c620d34cf5934b89ae8e00f6dc71a5bf1e/seqtk.c#L1243

The reason why the cat works is because cat will concatenate the contents of all of its arguments, the result is still a valid FASTA file, and seqtk will read that input as if it were one file coming in on stdin.

tseemann commented 4 years ago

@deprekate can you close this issue now please?