Closed sckott closed 2 months ago
The ones in the package that use HTML scraping are all ones that have been added on request, so they would be good things to support (will have to check which tables they correspond to). Looks like those functions are:
added faoareas in ddce1c2d47a8611c6f6dfac8392ad694ecf83d07
@cboettig Did you want API endpts for each of those above in the list? Or did you actually mean access to those tables, then you'd take care of combining data on the R side?
Right, I was just thinking access to the relevant tables and we would combine on the R side. Less efficient that way but also lighter on the server & just more familiar. We can then revisit doing more preprocessing later to reduce the number of api calls needed & avoid as much joining in R.
On Tue, Feb 3, 2015, 10:56 AM Scott Chamberlain notifications@github.com wrote:
@cboettig https://github.com/cboettig Did you want API endpts for each of those above in the list? Or did you actually mean access to those tables, then you'd take care of combining data on the R side?
— Reply to this email directly or view it on GitHub https://github.com/ropensci/fishbaseapi/issues/2#issuecomment-72711177.
@cboettig Okay, I'll comment out those methods that do joins, etc. and just give access to the tables needed
Compare this list to the list of tables shown on each species summary page: https://github.com/ropensci/rfishbase/issues/36
As usual, the map between manual, the website, and the SQL tables is hardly 1:1:1, but anyway...
Here's a quick overview of the main tables as extracted from the FishBase Manual.
Key:
\ means prioritized endpoint
Not suggesting that we need to implement all of them, but just putting this down here as a placeholder to help group and prioritize, as well as track what we already have. I'll try and update and annotate the above list to highlight things we want to prioritize or ignore, etc.
The data is pretty messy so the R implementation to return some clean tidy data.frames is always going to lag substantially behind the API anyway. I hope to have a suitable set of backbone functions and then hopefully we can encourage contributions for data cleaning routines associated with a particular call.
@cboettig nice, those table urls don't seem to resolve for me :(
And do the check marks mean that we should have api routes for each table?
@sckott heh, welcome to FishBase. Just keep trying to refresh and I think most of the links should work. good illustration of why scraping the website was a terrible idea.
I've started to check off ones I think we have, and add **
's to the ones I think we should prioritize. (Clearly I should be making better use of Github emoji things here -- edits welcome!)
okay, thx, makes sense
@sckott You raise a good question about naming conventions for endpoints on https://github.com/ropensci/rfishbase/issues/31#issuecomment-73368519
I see your point about having good REST names, but it's a bit tricky here since we don't have control over the naming conventions of the SQL schema & tables but will want the API to be consistent / familiar with them anyway. The FishBASE manual lists six tables in the population dynamics group above, so it might not be obvious that populations
refers to popGrowth. We might be best off making the correspondence between the API endpoint and the SQL table 1:1, at least for endpoints which are essentially access to those particular tables. It's not super nice because the way the tables are organized is something of a mess, but I worry that things will just be more opaque if we rename them. does that make sense?
I'm trying to come up with good names for the higher-level functions implemented in the R package, most of which will need more than one SQL call to return something meaningful instead of just a bunch of reference codes to other tables anyway. It would be great to get your input on those names.
It may make sense to make some of those handled on the server end with their own endpoints, which could reduce the number of API queries. Not sure if the increased computation on the server side instead of the client side would outweigh the benefit though, particularly since whatever server this may end up on will probably be relatively underpowered, while doing the manipulation client-side in R with dplyr is pretty efficient. (If the API were powering a website where the computational power client-side was way less than what it is in R, doing these computations on the server may make more sense).
@cboettig is this updated with all endponts avail. in the api?
I think I've kept it up-to-date so far
On Wed, Feb 25, 2015, 4:39 PM Scott Chamberlain notifications@github.com wrote:
@cboettig https://github.com/cboettig is this updated with all endponts avail. in the api?
— Reply to this email directly or view it on GitHub https://github.com/ropensci/fishbaseapi/issues/2#issuecomment-76098773.
@cboettig k, just curious
@cboettig curious what tables you think would be good to have routes on? or are you mostly interested in replicating what the package already does, and the routes are secondary?