sckott / pytaxize

python port of taxize (taxonomy toolbelt) for R
https://sckott.github.io/pytaxize/
MIT License
34 stars 13 forks source link

added Id class function for EoL #66

Closed Daniel-Davies closed 4 years ago

Daniel-Davies commented 4 years ago

Added the Encylopedia of Life functions to obtain IDs of given species. Also refactored tests.

Related Issue

62

Example

c = Ids('panthera tigris') c.eol() c.extract_ids()

sckott commented 4 years ago

So some code needs to be re-worked. The EOL API url you used results in "pages" which do have IDs, but are page ids, not taxon ids. To get taxon ids you need to call e.g., https://eol.org/api/pages/1.0/328674.json

In the R get_eolid fxn, I get pages here https://github.com/ropensci/taxize/blob/master/R/get_eolid.R#L143 then get taxon ids for all those pages here https://github.com/ropensci/taxize/blob/master/R/get_eolid.R#L161 - - where e.g., the first page id in the result for your example above is https://eol.org/api/pages/1.0/328674.json - which has taxon name, rank name, etc. - which also sorts out the other issue I commented on that the _make_id call was missing taxon rank (and name too)

Daniel-Davies commented 4 years ago

I see what you mean. Do you think it's appropriate to limit the queries in some way here? It seems that page IDs for many names generate hundreds of IDs. Do you think we should take e.g. first 10 results, or it is better to pass everything back to the user?

sckott commented 4 years ago

In R taxize we limit result of the pages search to names that match the users query (via regex) - and then fetch the taxon data for each taxon at the /pages/1.0/xxxxx.json routes - does that make sense to do here?

Daniel-Davies commented 4 years ago

I hope this most recent commit fixes it. It filters the page ids in a similar way to your use of grep in R taxize, and then translates the page ids to the taxa ids.