kblin / ncbi-acc-download

Download files from NCBI Entrez by accession
Apache License 2.0
111 stars 8 forks source link

GFF3 files with sequence included #6

Open tseemann opened 6 years ago

tseemann commented 6 years ago

In bacterial genomics we normally want the annotations and the sequences/contigs. This tool currently only handles each separately (as that is how NCBI provides it) via -F fasta and -F gff3.

This -F fullgff proposal would download the fasta and gff and construct the following:

##gff-version 3
<insert GFF>
##FASTA
<insert FASTA>

See https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md

kblin commented 6 years ago

That's where I'd use GenBank files, to be honest. Those have more data in then than the GFF files provide for all I can see from my ncbi-genome-download experience. I see the appeal of this feature, but I need to think about how to integrate it in the current structure of the code.