iDigBio / ridigbio

ridigbio -- an R interface to iDigBio's API (see http://www.idigbio.org/)
http://idigbio.github.io/ridigbio/
Other
16 stars 10 forks source link

Include flags as request? #26

Closed seltmann closed 7 years ago

seltmann commented 9 years ago

Hi, I've been working with the iDigBio recordset data, using the corrected occurrences file in comparison to the raw file to clean up our georeferencing. I now have a new use case, where I am getting loads of plant data via ridigbio for modeling. It would be very helpful to me if I could get the recordset flags associated with each record. One of the first tasks when getting data together to model is check the lat/longs are in the states and countries they should be. You guys are already doing that!

Thanks again, Katja

mjcollin commented 9 years ago

I think the last release of ridigbio was before the flags were in the API but regardless, they're multivalued and there's code that skips that when building the returned dataframe.

What kind of data structure would you want to see? A column in the df that contained character vectors? Is the indexing syntax for that intuitive for people? For instance I decided that the dot syntax R uses is confusing but I still made two columns for "geopoint.lat" and "geopoint.lon" from the nested JSON structure because that is the delimiter we use in our API.

seltmann commented 9 years ago

I think the dot syntax is understandable, although it is not apparent why it is not the syntax of dwc (decimalLatitude, decimalLongitude) like the other fields. I understand that it has to do with iDigBio data structure, but thats as far as the understanding goes (yet, perhaps that is far enough?). I also think that having them as separate columns is better than nested json. Although, those are very important, and commonly used fields.

For error information, I am not certain what would be best for the return. Here is what I was thinking:

It would be important to know clearly 1) if a flag exists, and 2) corrected value for that record. It would also be important to be able to include flags in the result set or exclude them, based on passed parameters.

mjcollin commented 8 years ago

It looks like I already fixed up the returned data.frame to support multivalued fields. If I understand your use case, you want the indexed lat/lon, whether there is a flag for a fix to the lat/lon, and then the original lat/lon.

In discussing this we Alex, he said that the flag rev_geocode_mismatch would tell you whether we decided there was a problem with the geocoding. Reverse geocoding matching the given country is the criteria for deciding that there is a problem. So this would give you what you are looking for:

df <- idig_search_records(rq=list("genus"="acer", "flags"="rev_geocode_mismatch"), fields=c("uuid", "flags", "geopoint", "data.dwc:decimalLongitude", "data.dwc:decimalLatitude"), limit=10)

You can then look at flags with some syntax like df$flags[[1]][[1]]. The flags field contains a list of character vectors.

Also, beware that we are working on https://github.com/iDigBio/idigbio-search-api/issues/13

Please let me know if this doesn't meet your needs.

mjcollin commented 8 years ago

Alex has pushed a new either boolean or meta flag to indicate that the georeference has been "fixed" by us that is more intuitive than "rev_geocode_mismatch". It will be in the beta API for a few days and then in production maybe by Thanksgiving.