ncbi / datasets

NCBI Datasets is a new resource that lets you easily gather data from across NCBI databases.
https://www.ncbi.nlm.nih.gov/datasets
Other
327 stars 39 forks source link

annotation_info missing for GCF_000002945.1 in genome/accession/{accession}/dataset_report #380

Open manulera opened 6 days ago

manulera commented 6 days ago

Hi @olearyna,

I was using the field annotation_info from genome/accession/{accession}/dataset_report to tell users whether a given assembly has annotations. Since yesterday, it seems that annotation_info is missing from the response for GCF_000002945.1.

Compare:

Is this intentional? And is there a better way to check whether a given assembly has annotations?

The annotations can still be accessed for GCF_000002945.1 anyway, see https://api.ncbi.nlm.nih.gov/datasets/v2alpha/genome/accession/GCF_000002945.1/annotation_report?search_text=ase1

manulera commented 6 days ago

Similarly, this response is empty ( if setting has_annotation=true).

https://api.ncbi.nlm.nih.gov/datasets/v2alpha/genome/accession/GCF_000002945.1/dataset_report?filters.has_annotation=true

I guess the meaning of this might be that the annotation comes from the paired assembly GCA_000002945.2?

manulera commented 6 days ago

I figured I can use this endpoint instead to check for the annotation being present.

https://api.ncbi.nlm.nih.gov/datasets/v2alpha/genome/accession/GCA_006386175.1/annotation_report/download_summary

However, this endpoint gives the same error (404) when using an invalid accession and when using an accession that does not exist. The nice thing of the dataset_report endpoint was that in a single request, you could get info on whether the accession number exists, and whether it has annotations

olearyna commented 5 days ago

Hi manulera

Thanks for opening this issue. GCF_000002945.1 was recently updated to version 2 but there is an issue with the data release for the new version. We hope to get it resolved soon.

For checking if there is an annotation, this is the correct URL https://api.ncbi.nlm.nih.gov/datasets/v2alpha/genome/accession/GCF_000002945.1/dataset_report?filters.has_annotation=true. It should work when the bug with the version update is fixed. Additionally we are looking into a better response when a genome is not annotated.

I'll ping the issue when the version release is fixed.

Nuala

olearyna commented 2 days ago

Hi manulera,

The issue with the release of GCF_000002945.2 has been fixed. You can view the data report for the this latest version here

The previous version has also been fixed. To view a data report for a non-latest assembly you need to append the URL with a filter for all assemblies. https://api.ncbi.nlm.nih.gov/datasets/v2alpha/genome/accession/GCF_000002945.1/dataset_report?filters.assembly_version=all_assemblies

Let me know if you have any more issues.

Nuala