It might be helpful the tool would be able to append existing downloads so that people could accumulate their repository of interested genomes.
My experiences with this tool is only limited to download the genome package and I feel this feature is not very difficult to realize, since most part of the current package structure is already there. Currently, the core data is stored under ncbi_dataset/data/[Accession]/, which is already separated and could be continuously expanded. Also the assembly_data_report is in jsonl format, which is appendable.
Additional works needed primarily in 3 parts: (1) a way to update datasets_catalog.json file, which may not be that hard; (2) conflict management (if partial or all data already existed); (3) currently, the unzip process is not handled by the cli tool. So an additional utility script may be the easiest way to realize this functionality, aka, handling unzip, updating metadata, and dealing with conflicts.
It might be helpful the tool would be able to append existing downloads so that people could accumulate their repository of interested genomes.
My experiences with this tool is only limited to download the genome package and I feel this feature is not very difficult to realize, since most part of the current package structure is already there. Currently, the core data is stored under
ncbi_dataset/data/[Accession]/
, which is already separated and could be continuously expanded. Also the assembly_data_report is injsonl
format, which is appendable.Additional works needed primarily in 3 parts: (1) a way to update datasets_catalog.json file, which may not be that hard; (2) conflict management (if partial or all data already existed); (3) currently, the unzip process is not handled by the cli tool. So an additional utility script may be the easiest way to realize this functionality, aka, handling unzip, updating metadata, and dealing with conflicts.
Thank you very much!
Jincheng