ncbi / datasets

NCBI Datasets is a new resource that lets you easily gather data from across NCBI databases.
https://www.ncbi.nlm.nih.gov/datasets
Other
333 stars 39 forks source link

taxon_suggest endpoint does not return E. coli species (taxid:562) #317

Open manulera opened 5 months ago

manulera commented 5 months ago

I noticed that the taxon_suggest endpoint does not return the E. coli species (taxid:562), only some strains of the species.

https://api.ncbi.nlm.nih.gov/datasets/v2alpha/taxonomy/taxon_suggest/Escherichia%20coli

Is there a reason for this? I noticed that E. coli has 2 reference genomes, is it related to that?

olearyna commented 5 months ago

Hi Manu,

Thanks for reporting the issue with our taxon_suggest endpoint. We'll investigate and make improvements. I think its unrelated to the two E. coli reference genomes. In a few cases, we maintain multiple reference genomes for a species when it meets community needs.

Nuala

manulera commented 1 month ago

Hello! Following up on this ticket. Are there plans to fix this in the short term? When users come to my site, they will often try E. coli as the first dummy example, so it would be quite helpful for me.

olearyna commented 1 month ago

Hi Manu,

Thanks for pinging this issue. We're currently reviewing the API parameters to improve clarity. In the meantime, could you please test the following URL and let us know if it resolves your issue? https://www.ncbi.nlm.nih.gov/datasets/api/datasets/v2alpha/taxonomy/taxon_suggest/eschirichia%20coli?tax_rank_filter=higher_taxon&taxon_resource_filter=TAXON_RESOURCE_FILTER_ALL.

Looking forward to hearing from you.

Nuala

manulera commented 1 month ago

Hi @olearyna thank you so much for the quick reply, and sorry for the delay in coming back to you. The notification got buried in a huge list of failed CI runs.

I tried with those parameters, and I get E. coli as one of the retreived species. However, I don't really understand the difference between the two api returns. Below are the results printed for the request https://www.ncbi.nlm.nih.gov/datasets/api/datasets/v2alpha/taxonomy/taxon_suggest/coli with or without those query parameters, filtered for taxon=SPECIES. Some species are returned in both cases, but some only in one.

Screenshot 2024-06-17 at 14 21 04

But if I use those parameters for "pombe", this is what I get

Screenshot 2024-06-17 at 14 22 58

I was thinking I could merge both return values, but I would not want those "expression vector" results either.

Any advice?

olearyna commented 1 month ago

Hi manulera,

The filters are designed to provide the most relevant suggestions for different NCBI Datasets services, such as gene, genome, and taxonomy. Here is an example of the logic behind the 'pombe' search to help explain the differences. Sorry for any confusion and will look into updating the documentation.

Standard Suggestion

URL: Taxon Suggest for Pombe This URL returns suggestions for species and below taxids that have data in our gene service.

datasets summary gene taxon pombe
Error: The taxonomy name 'pombe' is not exact. Try using the taxid: 
Schizosaccharomyces pombe (species, taxid: 4896, fission yeast).

Higher Taxon Filter

URL: Taxon Suggest for Higher Taxon This filter expands the suggestions to include ranks above species and we use this for our genome service.

datasets summary genome taxon pombe
Error: The taxonomy name 'pombe' is not exact. Try using the taxid: Schizosaccharomyces pombe (species, taxid: 4896, fission yeast).

All Taxa and Ranks Filter

URL: Taxon Suggest for All Taxa and Ranks This filter returns suggestions for all taxa and ranks, whether or not they have gene or genome data.

datasets summary taxonomy taxon pombe
Error: The taxonomy name 'pombe' is not exact. Try using one of the suggested taxids:
Schizosaccharomyces pombe (species, taxid: 4896, fission yeast)
Schizosaccharomyces pombe 972h- (strain, taxid: 284812)
Schizosaccharomyces pombe expression vector pDUAL2-HFG1c (species, taxid: 477364)
Schizosaccharomyces pombe expression vector pDUAL2-HFF41 (species, taxid: 477338)
Schizosaccharomyces pombe expression vector pDUAL2-HFF1 (species, taxid: 477337)
Schizosaccharomyces pombe expression vector pDUAL2-FFH1c (species, taxid: 477367)
Schizosaccharomyces pombe expression vector pDUAL-YFH1c (species, taxid: 477382)
Schizosaccharomyces pombe expression vector pDUAL-HFG81 (species, taxid: 477329)
Schizosaccharomyces pombe expression vector pDUAL-HFG61 (species, taxid: 477404)
Schizosaccharomyces pombe expression vector pDUAL-HFG51c (species, taxid: 477419)

I hope this helps. Let me know if you have more questions.

All the best,

Nuala

manulera commented 1 month ago

Hi @olearyna that makes sense. Thank you for the followup!