ncbi / fcs

Foreign Contamination Screening caller scripts and documentation
Other
88 stars 12 forks source link

[FEATURE REQUEST]: accept gz input for `clean genome` #72

Closed schellt closed 4 months ago

schellt commented 4 months ago

Describe the problem you'd like to be solved I am running fcs 0.5.0 with the singularity container and provided wrapper scripts. All tools of fcs I tested so far accept compressed fasta files (gz) as input but clean genome doesn't and crashes as follows:

Fatal error: fasta.cpp:215 in consume_fasta_line(...): Missing FASTA header (defline).

When I decompress the fasta file it works as expected.

Describe the solution you'd like clean genome accepts compressed fasta files as input.

Describe alternatives you've considered None.

etvedte commented 4 months ago

Hello,

Can you provide the command you are using for clean genome? By using zcat on the compressed FASTA you can pipe to clean genome without uncompressing the file, like so: zcat uncleaned.fa.gz | python3 ./fcs.py clean genome ...

In a future release, we plan to support automatic decompression when inputs are specified as filenames, e.g. clean-genome --input=uncleaned.fa.gz. Does this meet your needs?

Eric

schellt commented 4 months ago

Hello, thank you very much for your reply. Indeed, I did not use zcat and piped it into fcs.py, despite being documented - sorry. For convenience (e.g. using fcs in a pipeline), it still would be great if you implement direct reading of compressed and uncompressed files with --input. And yes, this meets my needs exactly. Thanks again and best regards, Tilman