ncbi / datasets

NCBI Datasets is a new resource that lets you easily gather data from across NCBI databases.
https://www.ncbi.nlm.nih.gov/datasets
Other
371 stars 42 forks source link

How to download genome annotation report #379

Open gunjanpandey opened 5 months ago

gunjanpandey commented 5 months ago

Could you please explain how to download annotation reports in *.jsonl format for a single organism or multiple organisms of a particular taxid using datasets?

The dataformat seems to have a command to parse these files but there is no mention in the datasets for download.

dataformat excel genome-annotations --inputfile assembly_package/ncbi_dataset/data/annotation_report.jsonl --outputfile annotations.xlsx

The above command is provided here

mtntsuchiya commented 5 months ago

Hi gunjanpandey, Thanks for your message and for your interest in NCBI datasets and dataformat.

Currently, it's not possible to retrieve the genome annotation report by taxid. You can access this information on the web from the taxonomy or assembly pages by clicking on "View Annotated Genes" (I'm using sunflower as an example), or from the API using an accession. We plan to add the file to the genome data package in the future.

Please let us know if you have any other questions.

Best, Mirian

Mirian T. N. Tsuchiya, Ph.D. Bioinformatics Data Wrangler (contractor) NCBI Datasets (NCBI/NLM/NIH) (she/her/hers)