kblin / ncbi-acc-download

Download files from NCBI Entrez by accession
Apache License 2.0
111 stars 8 forks source link

Write to a specified file name without implicit extension #3

Closed sjackman closed 6 years ago

sjackman commented 6 years ago

Please provide an option to write to a specified filename that does not add an implicit file extension. Two reasons for this:

  1. Allow using an extension other than .fa such as .fasta
  2. Writing to /dev/stdout using -o /dev/stdout to permit piping into another tool, such as gzip.
sjackman commented 6 years ago

Why does the implicit extension _0.fa include _0?

kblin commented 6 years ago

What command line did you use that gave you the _0?

kblin commented 6 years ago

Ah, you were using --out. The reason for this is that you can download multiple files, and all of those will be called yourprefix_N.filetype to stop them from overwriting each other.

kblin commented 6 years ago

An alternative solution I'd see would be to add all records to the filename specified with --out, possibly even skipping the "add an extension" logic. Would that work for you?

sjackman commented 6 years ago

Yeah, writing them all to the one file specified by --out would work for me. The existing behaviour could be retained as a --prefix option, if you liked.

kblin commented 6 years ago

It's not super-trivial, because right now we run the download_to_file function once per NCBI accession. The reason we do this is that in my experience, larger download batches increase the chance that the file will break off in mid-transfer, and detecting that is hard.

But this means that with the current code and using --out, you'd keep overwriting your file contents because we open with open(filename "w"). This could be fixed using open(filename, "a") instead, but that would break the default case when a file was already downloaded by appending a second copy of the contents.

sjackman commented 6 years ago

An option to write to /dev/stdout would be good enough for me.

kblin commented 6 years ago

This is now implemented in version 0.2.0

sjackman commented 6 years ago

Thanks, Kai!