ncbi / datasets

NCBI Datasets is a new resource that lets you easily gather data from across NCBI databases.
https://www.ncbi.nlm.nih.gov/datasets
Other
355 stars 39 forks source link

Possible to query datasets by genome notes? #386

Closed michaelbarton closed 2 months ago

michaelbarton commented 2 months ago

Is your feature request related to a problem? Please describe.

Unable to query NCBI datasets by genome notes field. E.g. I would like a way to find all the genomes with the genome note derived from single cell data.

Describe the solution you'd like

A way to query by this field, which appears to be an enum?

ericcox1 commented 2 months ago

Hi @michaelbarton,

Thanks for opening this issue.

I would like a way to find all the genomes with the genome note derived from single cell data.

Here's what I would recommend: 1) Generate a table that includes two columns, assembly accession and genome notes 2) Use grep to find rows containing derived from single cell

For example:

datasets summary genome taxon cyanobacteriota --as-json-lines | dataformat tsv genome --fields accession,assminfo-notes | grep -m5 "derived from single cell"
GCA_002017915.1 derived from single cell
GCA_002017955.1 derived from single cell
GCA_002018045.1 derived from single cell
GCA_003030805.1 contaminated,derived from single cell,fragmented assembly,genus undefined
GCA_028658425.1 derived from single cell

Please let me know if you have any questions.

Best, Eric

Eric Cox, PhD [Contractor] (he/him/his) NCBI Datasets NIH/NLM/NCBI eric.cox@nih.gov

michaelbarton commented 2 months ago

Thanks, awesome. Thanks for being so responsive Eric. I will give that a try.