Open atiro opened 4 years ago
The NGA shares this (and other) use cases.
I think this would be pretty useful in general. There are a lot of different back-ends that might be used to generate an API response dynamically, whether MySQL, ElasticSearch, Solr, a proprietary graph db, or a SPARQL endpoint directly. Lucene-based APIs allow you to specify the fields you want returned. In theory, a Linked Art API could specify the fields that are available for filtering/simplifying the response. If you take the IIIF info.json response as an example, the JSON file returns a list of functionalities that the client can read in order to make choices. The OpenRefine API does the same (see http://nomisma.org/apis/reconcile). If Linked Art had a default response for its endpoint that listed field names that are available for filtering, then the client can construct an HTTP request based on those fields, which ideally conform to properties that have been expressed in the LA context file.
It would be up to back-end developers (like myself) to build the middleware that takes HTTP request parameters or parameters expressed via JSON POSTs from the client, and then makes the necessary queries in the back-end to supply the JSON response the client expects.
edit: the example of the Nomisma.org OpenRefine reconciliation API is exactly this. I followed the OpenRefine documentation to built a layer on top of Nomisma's Solr index. The API takes JSON posted from a client (OpenRefine), converts that JSON into a Lucene query that is submitted to Solr, and then serializes the XML results from Solr back into the required JSON model.
There is (of course) a lot of water under this particular bridge in the information system world.
Server defined: OAI-PMH has a "simple dc" record and then the full record. Z39.50 has a "brief" record and a full record. SRU has an arbitrary list of recordSchemas, and 1.1 even allowed dynamic XPath expressions to be evaluated before returning results.
Client defined: SPARQL construct queries let you pick which bits of data you want. Similarly, GraphQL lets you send a template to fill out. Solr result export lets you define a field list of fields to put into the results.
The trade-off is obvious -- server defined is easy for the server to implement and cache but if the reduced syntax doesn't meet the client need, then they have to retrieve the full record anyway. Conversely the client defined syntax is very computationally expensive, especially with graph data.
The easiest, most web-friendly way to do it, in my opinion, is simply to have another URL that gives a server-defined subset of the data. Then when you're computing your full record, you strip it down and cache to the "brief" URL. Then have two indexes (using whatever technology), one for the full records and one for the brief. You can't do a search on the full dataset fields and then retrieve the brief form ... but you save a lot of implementation time.
We could try to define a subset of fields that are important... but we might find different fields important based on our UI expectations. I would think we could agree on:
That might be it, and might be enough?
I think there will be a lot of challenges with adoption of linked art in the developer community if it doesn't provide a standard web friendly mechanism for querying an institution's graph while fulfilling the vast majority of use cases. Not having a query standard would also prevent open source software from being developed that might otherwise be born as linked.art standards start to take off.
It might be possible to use a simple syntax for querying that can be chained to provide complexity / flexibility... or we could adopt an existing web-friendly standard with some recipes for implementation, e.g. SOLR, GraphQL, etc. but I do think we need a solid solution for this.
If we go the custom API route, there might be two JSON structures in play - one being the core model and the other being a small wrapper that provides pagination controls.
I agree with a caveat: in the medium term we need something like that, but I don't think we have the baseline format finished yet in order to have enough understanding about what the best interaction patterns for search are. Getting implementations of just the baseline is going to be tricky enough, without expecting a search implementation as well.
Pagination - I think we can use the IIIF / Annotation / ActivityStreams pattern if we're going more custom than just "drop in
That's fair.
For our new search endpoint we will be returning (per object record) (not finalised)
uniqueID
museumNumber
objectType
current_location
id
displayName
type
site
onDisplay
_primary_title
_primary_maker
name
association
_primary_image_id
_primary_date
_primary_place
_cultural_sensitivity_notice
_images
_primary_thumbnail
_iiif_image_base_uri
_iiif_presentation_uri
Edited: Can SPARQL be provided ? I See that this was mentioned as an option. But I still do not understand what is missing.
Before committing to a special unique-to-LA search endpoint, or (so much the more so) requiring full SPARQL support exposed to the open web (!) perhaps we could consider and record some very concrete use cases for this facility...
@beaudet, you remarked "I think there will be a lot of challenges with adoption of linked art in the developer community if it doesn't provide a standard web friendly mechanism for querying an institution's graph while fulfilling the vast majority of use cases." Could you describe some of those specific use cases? Are you thinking of real applications under consideration for development at NGA? Elsewhere? Can you give a little detail about why harvesting data sets and indexing them locally wouldn't be feasible or wouldn't be fit for purpose?
@ewg118 Did you have plans to use a search API for your purposes or to support federation harvesting or something else?
@akuckartz Did you have particular plans for using SPARQL over distributed Linked.Art datasets? Can you say something about the kinds of queries you would want to use? Can you write a bit about why harvesting data sets and putting them in a local SPARQL implementation wouldn't be a viable choice?
@ajs6f Primarily harvesting of external datasets. But I'm also pro-dogfooding, so I would also build a search API on our own collection that might allow third parties to harvest our data for other projects (eg AA Collab).
Here's a use case: Harvard Art Museums has a search API that enables a Lucene query paramter to be passed to the API and responds with JSON from from ElasticSearch. The fields returned in this response are particular to the indexing scheme Harvard Art Museums implemented. But let's say they are able to create a transformation layer so that their ES response is structured into Linked Art JSON-LD.
I can build a similar transformation on our Solr index that returns LA JSON-LD.
However, my fields are different than Harvard's fields, and the Lucene queries are going to differ. So ideally, we need to come up with a standard query syntax for LA that is also transformed into a Lucene (or MySQL or SPARQL or GraphQL, depending on the underlying data store) query.
Currently, we can all probably output a standardized Linked Art compliant JSON response, but each of us is going to have a different query syntax to get that response.
The intent is to not require anything heavy-weight like SPARQL, GraphQL or similar but to enable those to be built across aggregated data. This issue is less about search (which we have deferred working on until we have the representations able to be clearly published) but about whether two different representations should be published -without- search.
Maybe Triple Pattern Fragments are a solution. See https://linkeddatafragments.org/
@ajs6f
Example: A decision is made to re-platform our web site using Drupal. Rather than building 100% bespoke components for our collection search, artist and art object pages, we decide instead to build those using the Linked Art data model and search APIs. As a result, we create a community around standards based museum-centric Drupal components and benefit from the work of others when they sustain and extend the components. In other words, I think the advantage of a search standard is to help us access our own data more easily by building and sharing community based software centered around standards.
Proposed Baseline:
Primary Term
Accession Number
(or other primary ID)*
/ classified_as Type of Object
Thumbnail
/ access_pointWith any other properties possible as useful to local institution
Document as "Summary Format" under "Representations" bullet in API documentation.
Search / fetch and related services we're using for web sites that need art data
Locations / Places: find by various identifiers including ability to return all
Media: find by relationships with other art entities including ability to return all
Art Objects: find all, find by various identifiers, find by relationships with other entities, find by matching one or more values with one or more specific fields, scoped free text search sorting of search results
Art object fields that can be searched for value matches ARTIST_DISPLAYNAME, ARTIST_ALLNAMES, OWNER_ALLNAMES, HASLARGERIMAGERY, HASTHUMBNAIL, ONVIEW, TITLE, YEARS_SPAN, YEARS_BEGIN, MEDIUM, OBJECTID, ACCESSIONNUM, ATTRIBUTION_INV, PROVENANCE, OVERVIEW, HASOVERVIEWTEXT, CREDITLINE, LOCATION_ID, LOCATION_SITE, LOCATION_ROOM, LOCATION_UNITPOSITION, LOCATION_DESCRIPTION, DONORCONSTITUENTID, OWNERCONSTITUENTID, SUBCLASSIFICATION, TIMESPAN, THEME, SCHOOL, STYLE, NATIONALITY, CLASSIFICATION, APPLIEDTERMS, LASTMODIFICATION
Art object free text scoping ALLDATAFIELDS, CREDITLINE, MEDIUM, PROVENANCETEXT, DIMENSIONS, INSCRIPTION, MARKINGS, CATALOGRAISONNEREF, IMAGECOPYRIGHT, ARTISTS, ARTISTNATIONALITIES, OWNERS, DONORS, TERMS, BIBLIOGRAPHYTEXT, OVERVIEWTEXT, CONSERVATIONNOTES, SYSCATTEXT, EXHIBITIONHISTORY
Art object sorting options - some of these are no longer used as they were created for requirements that have been dropped
YEAR_ASC, // sorts by year
CLASSIFICATION_ASC, // sorts by classification alphabetically
ATTRIBUTIONINV_ASC, // sorts by inverted attribution alphabetically
TITLE_ASC, // sorts by title alphabetically
TITLE_DESC, // sorts by title alphabetically
OBJECTID_ASC, // sorts by object id numerically
OBJECTID_DESC, // sorts by object id numerically
HASLARGERIMAGERY_DESC, // sorts based on whether object has image restrictions
HASTHUMBNAIL_DESC, // sorts based on whether a thumbnail image is available
ONVIEW_DESC, // sorts based on whether object is on view or not
NUMARTISTS_ASC, // sorts based on number of artist associations in ascending order
// sorts based on whether the inverted attribution exactly matches the preferred display name of an associated artist
ATTRIBUTIONINV_ARTISTNAME_MATCH_ASC,
ACCESSIONNUM_ASC, // added for searches using an accession number
ACCESSIONNUM_DESC, // added for searches using an accession number
LASTDETECTEDMODIFICATION_ASC,
LASTDETECTEDMODIFICATION_DESC,
FIRST_ARTIST_ASC,
FIRST_ARTIST_DESC,
YEAR_MATCH, // sorts a vs. b based on matching each to the year of object c
CLASSIFICATION_MATCH, // "" but for classification
ATTRIBUTIONINV_MATCH, // "" but for inverted attribution
TITLE_MATCH, // "" but for title
ONVIEW_MATCH, // "" but for on view status
OBJECTID_MATCH, // "" but for object id match (probably no use cases for this)
HASLARGERIMAGERY_MATCH, // "" but for same image restriction status
HASTHUMBNAIL_MATCH, // "" but for same thumbnail status
ARTISTS_MATCH, // sorts a vs. b based on whether artists match that of object c
NUMARTISTSINCOMMON_MATCH_DESC, // sorts a vs. b based on number of artists in common with object c
NUMTHEMESINCOMMON_MATCH_DESC, // "" but for themes
NUMSTYLESINCOMMON_MATCH_DESC, // "" but for styles
NUMDONORSINCOMMON_MATCH_DESC, // "" but for donors
NUMARTISTNATIONALITIESINCOMMON_MATCH_DESC // "" but for nationalities of artists
Constituents: find all, find by various identifiers, find by relationships with other entities, find by matching one or more values with one or more specific fields, scoped free text search, sorting of search results
Constituent fields that can be searched for value matches PREFERRED_DISPLAY_NAME, ALL_NAMES, ISINDIVIDUALARTIST, -- as opposed to a workshop or similar ISARTISTOFNGAOBJECT, -- as opposed to a non-artist or artist of a non-collection work NATIONALITY, TIMESPAN, INDEXOFARTISTS_FIRST_TWO_LETTERS_LAST_NAME, INDEXOFARTISTS_LETTER_RANGE, HASBIOGRAHPY, YEARS_SPAN, CONSTITUENT_ID, ULAN_ID
Constituent free text scoping ALLDATAFIELDS
Constituent sorting options PREFERRED_DISPLAY_NAME_MATCH, // for comparing against a target constituent for closeness HASBIOGRAPHY_MATCH, PREFERRED_DISPLAY_NAME_ASC, // more typical sorting HASBIOGRAPHY_ASC, CONSTITUENTID_ASC
Faceting:
Type-Ahead lists: return type-ahead matches for specific fields (smaller data set as opposed to all data for records)
Is this needed as a core feature for 1.0? LUX experience has been that we need two representations -- a Very Small one that lets you construct a link, and the full representation. Anything in the middle is both very UI specific and class specific (we wanted different features for people compared to objects compared to events... obviously).
--> 1.1
We are starting to build features that need to be able to query our collections data to deliver some experiences. One of the most common use cases is to be able to query an endpoint for a term in a controlled vocabulary (materials, techniques, place, etc) and return n objects, but only a very small subset of the object data is needed (object id, title, image (thumbnail)). Possibly more fields might be required for some experiences on top of this base group.
An option would be to knock together the usual custom JSON API which has just the information needed for our experience, and that would certainly be easiest for devs, but if we are asking other people to use the Linked Art API, feels like we should be ourselves. Although perhaps trying to make it work for all endpoints isn't worthwhile.