ncbi / datasets

NCBI Datasets is a new resource that lets you easily gather data from across NCBI databases.
https://www.ncbi.nlm.nih.gov/datasets
Other
327 stars 39 forks source link

How to download #379

Open gunjanpandey opened 1 week ago

gunjanpandey commented 1 week ago

Could you please explain how to download annotation reports in *.jsonl format for a single organism or multiple organisms of a particular taxid using datasets?

The dataformat seems to have a command to parse these files but there is no mention in the datasets for download.

dataformat excel genome-annotations --inputfile assembly_package/ncbi_dataset/data/annotation_report.jsonl --outputfile annotations.xlsx

The above command is provided here

mtntsuchiya commented 1 week ago

Hi gunjanpandey, Thanks for your message and for your interest in NCBI datasets and dataformat.

Currently, it's not possible to retrieve the genome annotation report by taxid. You can access this information on the web from the taxonomy or assembly pages by clicking on "View Annotated Genes" (I'm using sunflower as an example), or from the API using an accession. We plan to add the file to the genome data package in the future.

Please let us know if you have any other questions.

Best, Mirian

Mirian T. N. Tsuchiya, Ph.D. Bioinformatics Data Wrangler (contractor) NCBI Datasets (NCBI/NLM/NIH) (she/her/hers)