ncbi / datasets

NCBI Datasets is a new resource that lets you easily gather data from across NCBI databases.
https://www.ncbi.nlm.nih.gov/datasets
Other
369 stars 41 forks source link

Order of output datasets #323

Closed alvanuffelen closed 8 months ago

alvanuffelen commented 8 months ago

The return order of genome summaries are not the same as the input order.

datasets summary genome accession --inputfile accessions.txt--as-json-lines --debug

From the debug:

"sort":[{"direction":"SORT_DIRECTION_ASCENDING","field":"organismName"},{"direction":"SORT_DIRECTION_DESCENDING","field":"isRefGenome"},{"direction":"SORT_DIRECTION_DESCENDING","field":"isRepGenome"},{"direction":"SORT_DIRECTION_DESCENDING","field":"isRefseq"},{"direction":"SORT_DIRECTION_ASCENDING","field":"accession"}]

To the best of my knowledge, this behavior is not documented. Unexpected outcomes may arise during downstream processing.

Is it feasible to provide an option to enable or disable sorting? accessions.txt output.json debug.txt

ericcox1 commented 8 months ago

Hi @alvanuffelen,

Thanks for this feature request.

Is it feasible to provide an option to enable or disable sorting?

Although the option to disable sorting is feasible, this would not result in the genome summaries being returned in the same order as the input order.

Although we do recognize the value in returning data in the same order as the input order, this feature is not easy to implement in our current system and we are unlikely to add this soon.

Best, Eric

Eric Cox, PhD [Contractor] (he/him/his) NCBI Datasets Sequence Enhancements, Tools and Delivery (SeqPlus) NIH/NLM/NCBI eric.cox@nih.gov