ncbi / datasets

NCBI Datasets is a new resource that lets you easily gather data from across NCBI databases.
https://www.ncbi.nlm.nih.gov/datasets
Other
327 stars 39 forks source link

Add an `--include all` option to `datasets download genome` #375

Open dtdoering opened 2 weeks ago

dtdoering commented 2 weeks ago

Is your feature request related to a problem? Please describe.

In my workflow, I frequently want to get the latest genome/annotation files for a number of RefSeq (GCF_*) and GenBank (GCA_*) genomes to do some further analyses. However, it can be hard to remember the exact spelling/terms used for each of the options, particularly when each desired file has to be listed.

Describe the solution you'd like

As a QoL feature, I'd like to be able to save some keystrokes by typing e.g.:

datasets download genome accession GCA_005981935.1 --include all

instead of:

datasets download genome accession GCA_005981935.1 --include genome,protein,cds,gff3,gbff,seq-report

So that the only thing I need to remember or copy/paste is the accession, instead of the accession and then the files listing.

Thanks!

ericcox1 commented 2 weeks ago

Hi @dtdoering,

Thanks for opening this issue. We will consider adding this feature in a future release.

Best, Eric

Eric Cox, PhD [Contractor] (he/him/his) NCBI Datasets NIH/NLM/NCBI eric.cox@nih.gov

dtdoering commented 2 weeks ago

Adding another reason -- since many GenBank bacterial genomes only have annotations in GenBank format (and no GFF), the --include all option would be very useful when used with the --preview option, so that one can see which files are even available for a given genome before deciding whether to download it or choose a different one.

That said, thanks for the info! Would love to see this added in a future release (or take a stab at a PR for it myself, pending #229)!