ropensci / fishbaseapi

Fishbase API
https://fishbaseapi.readme.io/
MIT License
42 stars 12 forks source link

Which tables need api routes? #2

Closed sckott closed 2 months ago

sckott commented 9 years ago

@cboettig curious what tables you think would be good to have routes on? or are you mostly interested in replicating what the package already does, and the routes are secondary?

cboettig commented 9 years ago

The ones in the package that use HTML scraping are all ones that have been added on request, so they would be good things to support (will have to check which tables they correspond to). Looks like those functions are:

sckott commented 9 years ago

added faoareas in ddce1c2d47a8611c6f6dfac8392ad694ecf83d07

sckott commented 9 years ago

@cboettig Did you want API endpts for each of those above in the list? Or did you actually mean access to those tables, then you'd take care of combining data on the R side?

cboettig commented 9 years ago

Right, I was just thinking access to the relevant tables and we would combine on the R side. Less efficient that way but also lighter on the server & just more familiar. We can then revisit doing more preprocessing later to reduce the number of api calls needed & avoid as much joining in R.

On Tue, Feb 3, 2015, 10:56 AM Scott Chamberlain notifications@github.com wrote:

@cboettig https://github.com/cboettig Did you want API endpts for each of those above in the list? Or did you actually mean access to those tables, then you'd take care of combining data on the R side?

— Reply to this email directly or view it on GitHub https://github.com/ropensci/fishbaseapi/issues/2#issuecomment-72711177.

sckott commented 9 years ago

@cboettig Okay, I'll comment out those methods that do joins, etc. and just give access to the tables needed

cboettig commented 9 years ago

Compare this list to the list of tables shown on each species summary page: https://github.com/ropensci/rfishbase/issues/36

As usual, the map between manual, the website, and the SQL tables is hardly 1:1:1, but anyway...

Here's a quick overview of the main tables as extracted from the FishBase Manual.

Key:

\ means prioritized endpoint

NOMENCLATURE

DISTRIBUTION

FAO STATISTICS

POPULATION DYNAMICS

TROPHIC ECOLOGY

REPRODUCTION

ICHTHYOPLANKTON

MORPHOLOGY AND PHYSIOLOGY

GENETICS AND AQUACULTURE

Other Tables

cboettig commented 9 years ago

Not suggesting that we need to implement all of them, but just putting this down here as a placeholder to help group and prioritize, as well as track what we already have. I'll try and update and annotate the above list to highlight things we want to prioritize or ignore, etc.

The data is pretty messy so the R implementation to return some clean tidy data.frames is always going to lag substantially behind the API anyway. I hope to have a suitable set of backbone functions and then hopefully we can encourage contributions for data cleaning routines associated with a particular call.

sckott commented 9 years ago

@cboettig nice, those table urls don't seem to resolve for me :(

And do the check marks mean that we should have api routes for each table?

cboettig commented 9 years ago

@sckott heh, welcome to FishBase. Just keep trying to refresh and I think most of the links should work. good illustration of why scraping the website was a terrible idea.

I've started to check off ones I think we have, and add **'s to the ones I think we should prioritize. (Clearly I should be making better use of Github emoji things here -- edits welcome!)

sckott commented 9 years ago

okay, thx, makes sense

cboettig commented 9 years ago

@sckott You raise a good question about naming conventions for endpoints on https://github.com/ropensci/rfishbase/issues/31#issuecomment-73368519

I see your point about having good REST names, but it's a bit tricky here since we don't have control over the naming conventions of the SQL schema & tables but will want the API to be consistent / familiar with them anyway. The FishBASE manual lists six tables in the population dynamics group above, so it might not be obvious that populations refers to popGrowth. We might be best off making the correspondence between the API endpoint and the SQL table 1:1, at least for endpoints which are essentially access to those particular tables. It's not super nice because the way the tables are organized is something of a mess, but I worry that things will just be more opaque if we rename them. does that make sense?

I'm trying to come up with good names for the higher-level functions implemented in the R package, most of which will need more than one SQL call to return something meaningful instead of just a bunch of reference codes to other tables anyway. It would be great to get your input on those names.

It may make sense to make some of those handled on the server end with their own endpoints, which could reduce the number of API queries. Not sure if the increased computation on the server side instead of the client side would outweigh the benefit though, particularly since whatever server this may end up on will probably be relatively underpowered, while doing the manipulation client-side in R with dplyr is pretty efficient. (If the API were powering a website where the computational power client-side was way less than what it is in R, doing these computations on the server may make more sense).

sckott commented 9 years ago

@cboettig is this updated with all endponts avail. in the api?

cboettig commented 9 years ago

I think I've kept it up-to-date so far

On Wed, Feb 25, 2015, 4:39 PM Scott Chamberlain notifications@github.com wrote:

@cboettig https://github.com/cboettig is this updated with all endponts avail. in the api?

— Reply to this email directly or view it on GitHub https://github.com/ropensci/fishbaseapi/issues/2#issuecomment-76098773.

sckott commented 9 years ago

@cboettig k, just curious