Closed arlin closed 5 years ago
For the above taxa, this is what I get from the portal:
The errors may be coming from NCBI or our service (see next comment) .
@ducvan0212 can you comment on why there appears to be a limit of 10?
For the above taxa, this is what I get by invoking the web service directly:
After trying these a few times, I find that the above results are repeatable. Animalia and Enterobacteriaceae give errors repeatedly. @abusalehmdtayeen any comments?
@arlin The portal had a default bound on the number of species queried from a taxon. If you want to take more, you have to choose the number in the select box. Maybe the behavior is from very first commits when I developed the portal. I removed the bound, so you can see every species the portal received from the service. But I see the default number of queried species as described in https://github.com/phylotastic/phylo_services_docs/tree/master/ServiceDescription#taxonspecies is 20.
Can @abusalehmdtayeen also remove the bound on the service side?
OK, the portal is now fixed. And the web service is using taxon-specific searches. Our only problem now is to figure out the error messages.
@ducvan0212 there is no bound on the number of species returned from the service. Even in the documentation I did not write anything about bounding the result species.
@arlin , I found the problem with the three taxons: Enterobacteriaceae
, Animalia
, Eukaryota
. For each of these, either there are large number of genomes or large number of species. For example, Enterobacteriaceae
has 274 genomes. But when it is cross-referenced with taxonomy database, there are ~42,000 species found. NCBI API can not handle more than 300 species at a time. So it was throwing error. I have now fixed it in a way where there will be multiple calls to the API in case such event occurs. As a result, for these inputs, response time will be slower.
@abusalehmdtayeen, I don't understand this. Searching on "Insecta" gives 395 and this was working, but "Enterobacteriaceae" has fewer genomes and it was not working. Why? This must have something to do with a second step "when it is cross-referenced with taxonomy" but I don't understand that part of it.
and thanks for the fixes!
Taxon search at NCBI for genomes should use the "[ORGN]" or "[organism]" field to access NCBI's taxonomy. We should get the following numbers:
Some of these are too long to make a tree, but all of them should make a list.