ncbi / datasets

NCBI Datasets is a new resource that lets you easily gather data from across NCBI databases.
https://www.ncbi.nlm.nih.gov/datasets
Other
369 stars 41 forks source link

Append current downloads #399

Closed ETaSky closed 2 months ago

ETaSky commented 2 months ago

It might be helpful the tool would be able to append existing downloads so that people could accumulate their repository of interested genomes.

My experiences with this tool is only limited to download the genome package and I feel this feature is not very difficult to realize, since most part of the current package structure is already there. Currently, the core data is stored under ncbi_dataset/data/[Accession]/, which is already separated and could be continuously expanded. Also the assembly_data_report is in jsonl format, which is appendable.

Additional works needed primarily in 3 parts: (1) a way to update datasets_catalog.json file, which may not be that hard; (2) conflict management (if partial or all data already existed); (3) currently, the unzip process is not handled by the cli tool. So an additional utility script may be the easiest way to realize this functionality, aka, handling unzip, updating metadata, and dealing with conflicts.

Thank you very much!

Jincheng

olearyna commented 2 months ago

Hi ETaSky,

Thank you for the suggestions! We really appreciate your input and will keep it in mind as we work on future updates.

All the best,

Nuala