ropensci / fishbaseapi

Fishbase API
https://fishbaseapi.readme.io/
MIT License
42 stars 12 forks source link

Inconsistent keys in fishbase tables #83

Closed cboettig closed 8 years ago

cboettig commented 8 years ago

Looks like some tables, such as maturity table, have the Species Code as Speccode instead of SpecCode.

rfishbase uses a generic function to call endpoints by Species Code, since most tables are keyed by Species Code and this avoids a lot of repetitive code. I could try and handle these cases explicitly, but a general server-side solution might be better. @EvilScott any ideas on how best to handle this?

also, interesting that we don't get an error when requesting a non-existent field to query (e.g. asking for SpecCode when the field is called Speccode):

http://fishbase.ropensci.org/maturity?SpecCode=4

http://fishbase.ropensci.org/maturity?Speccode=4

MollsReis commented 8 years ago

Yuck, this is why its best to always use all lowercase in DB design :smile:

I'll look into it. :+1:

MollsReis commented 8 years ago

I was just converted to full-time at Vulcan :tada: so I have some new-hire stuff to go through this second, but if you want to take a whack at it yourself I'd check out the Base#endpoint method here. The result hash keys need to be examined and the Speccode key changed to SpecCode.

sckott commented 8 years ago

interesting that we don't get an error when requesting a non-existent field to query (e.g. asking for SpecCode when the field is called Speccode

Weird, in mysql console locally, I can use Speccode or SpecCode and I get the same result - Maybe it's something in activerecord? just guessing

MollsReis commented 8 years ago

It may also be a setting that is specific to your MySQL instance. I know our DevOps guy had to change a bunch of settings to get the FB site to run correctly due to a number of casing issues.

I inspected the schema and FB indeed does use Speccode in some places and SpecCode in others. MySQL may allow for case insensitivity when querying, but processes that ingest that data may not be case insensitive. #lowercaseisthebestcase :wink:

cboettig commented 8 years ago

All for standardizing on lowercase if possible. And all the better if this can get standardized / sanitized further upstream in the database.

cboettig commented 8 years ago

and @EvilScott , :clap: :fireworks: for converting to full time!

sckott commented 8 years ago

missed that, congrats @EvilScott on full time!

MollsReis commented 8 years ago

Thanks guys :smile:

I'd love to push up for FB to use lowercase in the DB, but looking at the web code they are pretty heavily using camel case in the PHP code when hitting MySQL. So I don't know if we can count on a good standardization happening anytime soon :grimacing:

cboettig commented 8 years ago

@sckott Do you have a list handy of tables that use Speccode (or perhaps other forms of capitalization) instead of SpecCode? May as well fix that on the rfishbase client meanwhile.

sckott commented 8 years ago

just a sec

MollsReis commented 8 years ago

http://fishbase.ropensci.org/listfields?fields=Speccode&exact=true http://fishbase.ropensci.org/listfields?fields=SpecCode&exact=true

Note that not all of these tables has a corresponding model in ActiveRecord.

sckott commented 8 years ago

here's a jq version using the API

http 'http://fishbase.ropensci.org/listfields' | jq '.data[] | select(.column_name | match("speccode"; "i"))' | jq -s

[
  {
    "table_name": "abnorm",
    "column_name": "SpecCode"
  },
  {
    "table_name": "abundance",
    "column_name": "SpecCode"
  },

...
cboettig commented 8 years ago

ah, of course, listfields is handy for this, thanks! We'll get these changes into rfishbase at least.

cboettig commented 8 years ago

@sckott @EvilScott Do you think it's worth trying to address this on the API end or shall we just close this? Not really a bug since the keys are different in the database, but certainly a gotcha for anyone trying to navigate the API.

sckott commented 8 years ago

I guess we could lowercase all keys. Do you think that's a bad idea @EvilScott

MollsReis commented 8 years ago

@cboettig thats true. someone may actually expect things to be the "proper" cases and if we change it we may be screwing up someone else's code :grimacing:

@sckott lowercasing all keys on the API or the R library?

sckott commented 8 years ago

@EvilScott lowercase the keys in the JSON returned from the API

MollsReis commented 8 years ago

@sckott i think its correct, but its a judgement call that you guys need to make.

sckott commented 8 years ago

@cboettig thoughts on lower-casing keys in all data output by the API? Don't know yet what performance hit we'll take, but hopefully minimal

cboettig commented 8 years ago

Probably better to just stick with what the database gives us. Changing the output won't avoid the original problem here (we'd have to do something about mapping input queries), and might be inconsistent if user somehow expect the capitalization used in the underlying data..

I do wonder if we should do something to the https://github.com/ropensci/fishbaseapi/blob/master/models/base.rb#L7 method though so that either SpecCode or Speccode do the same thing. It's the main key for most of the database, and it's kind of silly to expect users / developers to keep track of whether it is Speccode or SpecCode for each and every table. Not sure how best to handle that though. If the SQL query is case-insensitive, maybe we can just re-map both to lowercase there?

sckott commented 8 years ago

@EvilScott been playing with this. Seems that with activerecord, we're fine with queries passed to fields as case doesn't matter there, but on the where statement, it does matter. Can we implement a LIKE statement with activerecord::where? From my googling, that seems like it would be case insensitive.

MollsReis commented 8 years ago

@sckott I'm confused. I thought the issue was the inconsistencies in casing returned from the DB, not querying the DB. Where do you need to implement a LIKE statement?

sckott commented 8 years ago

I think we have problems in querying, not just data returned

In the where statement, it is case sensitive, so the call

http://fishbase.ropensci.org/maturity?SpecCode=4

doesn't actually work as the where statement returns data, but as you can see the results don't match 4

Actually it does appear that select statement to return certain fields is case sensitive too, e.g.,

http://fishbase.ropensci.org/maturity?fields=SpecCode

doesn't return just Speccode - whereas the right case works:

http://fishbase.ropensci.org/maturity?fields=Speccode

MollsReis commented 8 years ago

Hrm, interesting. That adds another wrinkle

MollsReis commented 8 years ago

OK so first part is the issue with the WHERE clause. We are using params that are included in the column list. Here is the code. The kicker is Array#include? is case sensitive:

1.9.3-p551 :002 > %w(foo BAR baz).include? "bar"
 => false 

So we may still be able to use case insensitive in our querying, but with that select block it may be excluding valid params.

MollsReis commented 8 years ago

For the SELECT clause I'd guess a similar issue is here in the place where we are making a list of fields. Off chance: it may also have to do with wrapping the field name in backticks.

sckott commented 8 years ago

Thanks, that's helpful.

it may also have to do with wrapping the field name in backticks.

Right, we did that b/c of https://github.com/ropensci/fishbaseapi/issues/79#issuecomment-157890188

I guess we wouldn't need backticks if there weren't reserved words in the tables, but we can't change that

MollsReis commented 8 years ago

I just tested backticks with my local MySQL DB and it doesn't seem to make a difference.

select `classnum` from classes limit 10;

returns the same rows as:

select `ClassNum` from classes limit 10;

The only difference is the column name of the result, classnum vs ClassNum. Hopefully this means if we make the #include? check case insensitive (easiest to accomplish by downcasing the needle and the haystack both) then this should solve both problems.

sckott commented 8 years ago

is this most efficient x.collect { |x| x.downcase }

MollsReis commented 8 years ago

Ruby protip incoming:

%w(FOO BAR BAZ).map(&:downcase) # => ["foo", "bar", "baz]

http://stackoverflow.com/questions/1217088/what-does-mapname-mean-in-ruby

sckott commented 8 years ago

nice

sckott commented 8 years ago

@EvilScott I got something to work so that case is ignored

where(params.select { |param| fields.any? { |s| s.casecmp(param)==0 } })

However, other params passed in the query e.g., limit make the query fail with no implicit conversion of Symbol into String - Is there something obvious I can do to fix

MollsReis commented 8 years ago

If you just #to_s on the things you are comparing you should be fine:

s.to_s.casecmp(param.to_s) == 0
sckott commented 8 years ago

ah, thanks, will try

MollsReis commented 8 years ago

excellent :grinning:

sckott commented 8 years ago

:bullettrain_side: