kblin / ncbi-acc-download

Download files from NCBI Entrez by accession
Apache License 2.0
110 stars 8 forks source link

running with `--format fasta` creates an empty fa file #21

Open orenmn opened 3 years ago

orenmn commented 3 years ago

(This was already mentioned in https://github.com/kblin/ncbi-acc-download/issues/13#issuecomment-531677362, but I think it is better to have a separate issue.)

ncbi-acc-download --format fasta --recursive --verbose AAXATB000000000.1 creates an empty fasta file, while ncbi-acc-download --format genbank --recursive --verbose AAXATB000000000.1 creates a genbank file as expected.

The same thing happened when I tried on ACIN00000000.3.

orenmn commented 3 years ago

IIUC, an easy fix is to implement --format fasta by using --format genbank (the default) and then using SeqIO.convert (from Biopython), e.g.: SeqIO.convert('AAXATB000000000.1.gbk', 'genbank', 'AAXATB000000000.1.gbk.fasta', 'fasta')

kblin commented 3 years ago

The NCBI Entrez API does deliver FASTA files, just not if you query for WGS master entries. I don't really want to depend on Biopython for all of ncbi-genome-download, but arguably we could go that path if --recursive is specified, as we depend on Biopython for --recursive anyway.