iDigBio / idigbio-search-api

Server-side code driving iDigBio's search functionality.
GNU General Public License v3.0
24 stars 5 forks source link

routes for searching for datasets/recordsets #14

Open sckott opened 8 years ago

sckott commented 8 years ago

Am I missing something? Seems there's no route for searching for datasets, sort of like those in GBIF http://www.gbif.org/developer/registry

godfoder commented 8 years ago

There is, but it is currently undocumented. One of the reasons we haven't worked on it at all is that the metadata we collect is very bare, and often not what you really expect.

The endpoint is at http://search.idigbio.org/v2/search/recordsets if you want to poke around. The search parameter is rsq (equivalent to rq and mq on the record and media endpoints). Whats in there is a minimum set of identifying information scrapped from the RSS feeds and EML files. Deeper information, like the links to hand curated institution and collection metadata like GBIF has, is spotty since we don't have a registration process we make all of our providers go through.

There is also a publishers endpoint: http://search.idigbio.org/v2/search/publishers but that data is basically so bare its useless to anyone outside iDigBio. The search parameter on that is pq.

The Meta fields endpoints work for both types as well:

http://search.idigbio.org/v2/meta/fields/recordsets http://search.idigbio.org/v2/meta/fields/publishers

If you have a specific use case your working towards, we can try and augment the available data if possible to move those types closer to production usability.

sckott commented 8 years ago

Thanks for the quick response.

Use case: searching for datasets by a specific institution, collection within an institution, etc.

Working on a collaboration with CalAcademy right now https://github.com/ropenscilabs/spplit - and its not a hugely important part, but would be nice to programmatically let users search for datasets - then we use that UUID to search for specimen records, etc. Right now since this is just a CalAcademy thing, I'm just caching the UUIDs associated with CalAcademy collections in the library itself