ncbi / datasets

NCBI Datasets is a new resource that lets you easily gather data from across NCBI databases.
https://www.ncbi.nlm.nih.gov/datasets
Other
369 stars 41 forks source link

Extra info in 404 response if assembly exist but not included in the datasets API #370

Closed manulera closed 5 months ago

manulera commented 5 months ago

Is your feature request related to a problem? Please describe.

Not really a problem, but an improvement that would make my use-case easier.

I noticed that previous revision assemblies are not included in the API. For instance, for GCF_004355105:

Current version GCF_004355105.2 works, but GCF_004355105.1 gives the following:

https://api.ncbi.nlm.nih.gov/datasets/v2alpha/genome/accession/GCF_004355105.1/annotation_report

{
  "error": "Not Found",
  "code": 404,
  "message": "Your request is invalid. (For more help, see the NCBI Datasets Documentation at https://www.ncbi.nlm.nih.gov/datasets/docs/)"
}

Which is the same response as if an invalid assembly id was used.

In my website, I use a request to genome/accession/${assemblyId}/dataset_report to validate whether an assembly entered by the user exists. The issue is that I get the same response with:

Describe the solution you'd like

It would be nice if the 404 response message was different in each case, and potentially include the identifier of the latest version.

olearyna commented 5 months ago

Hi manulera

Thanks for your suggestions on improving the error code messages. We appreciate your feedback and will look into improving our response system. Below are a few curl commands that might help with the issues you brought up

To check if an accession is valid, you can use the following command:

curl -X GET "https://api.ncbi.nlm.nih.gov/datasets/v2alpha/genome/accession/GCF_000001405.40%2CGCF_000001635.27%2CGCF_9999.1/check" \ -H "Accept: application/json"

For the dataset_report, you need to add the following parameter to see if there is data for an accession curl -X GET "https://api.ncbi.nlm.nih.gov/datasets/v2alpha/genome/accession/GCF_004355105.1/dataset_report?filters.assembly_version=all_assemblies&returned_content=ASSM_ACC" -H "Accept: application/json"

You can use the following command to get the latest version:

curl -X GET "https://api.ncbi.nlm.nih.gov/datasets/v2alpha/genome/accession/GCF_000001405/dataset_report?filters.assembly_version=current&returned_content=ASSM_ACC" -H "Accept: application/json"

Please let me know if you have any more suggestions or questions.

Nuala

manulera commented 5 months ago

Hi @olearyna thank you so much for the swift reply. That works for my use-case, and it's great that you get the identifier of the latest assembly, so I will be able to display that for users.

Just to double-check that I got that right, all assemblies with status previous are not accessible through the annotation report, right? It's nothing special about GCF_004355105.1.

olearyna commented 5 months ago

Hi manulera,

Apologies, I didn't address that issue in my response. We should have annotation tables for all previous assembly versions that are annotated. For example: https://api.ncbi.nlm.nih.gov/datasets/v2alpha/genome/accession/GCF_014058445.1/annotation_report. There seems to be a bug with the example you provided. We're looking into it.

Nuala

manulera commented 5 months ago

Hi @olearyna, thanks for the clarification! If possible, let me know when GCF_004355105.1 is fixed, since a user is using annotations based on that assembly

ericcox1 commented 4 months ago

Hi @manulera, This has been fixed. Please check: https://api.ncbi.nlm.nih.gov/datasets/v2alpha/genome/accession/GCF_004355105.1/annotation_report

Best, Eric

manulera commented 4 months ago

Thanks @ericcox1 !