linked-art / linked.art

Development of a specification for linked data in museums, using existing ontologies and frameworks to build usable, understandable APIs
https://linked.art/
Other
90 stars 13 forks source link

Function API endpoint for limited set of fields #362

Open atiro opened 4 years ago

atiro commented 4 years ago

We are starting to build features that need to be able to query our collections data to deliver some experiences. One of the most common use cases is to be able to query an endpoint for a term in a controlled vocabulary (materials, techniques, place, etc) and return n objects, but only a very small subset of the object data is needed (object id, title, image (thumbnail)). Possibly more fields might be required for some experiences on top of this base group.

An option would be to knock together the usual custom JSON API which has just the information needed for our experience, and that would certainly be easiest for devs, but if we are asking other people to use the Linked Art API, feels like we should be ourselves. Although perhaps trying to make it work for all endpoints isn't worthwhile.

beaudet commented 4 years ago

The NGA shares this (and other) use cases.

ewg118 commented 4 years ago

I think this would be pretty useful in general. There are a lot of different back-ends that might be used to generate an API response dynamically, whether MySQL, ElasticSearch, Solr, a proprietary graph db, or a SPARQL endpoint directly. Lucene-based APIs allow you to specify the fields you want returned. In theory, a Linked Art API could specify the fields that are available for filtering/simplifying the response. If you take the IIIF info.json response as an example, the JSON file returns a list of functionalities that the client can read in order to make choices. The OpenRefine API does the same (see http://nomisma.org/apis/reconcile). If Linked Art had a default response for its endpoint that listed field names that are available for filtering, then the client can construct an HTTP request based on those fields, which ideally conform to properties that have been expressed in the LA context file.

It would be up to back-end developers (like myself) to build the middleware that takes HTTP request parameters or parameters expressed via JSON POSTs from the client, and then makes the necessary queries in the back-end to supply the JSON response the client expects.

edit: the example of the Nomisma.org OpenRefine reconciliation API is exactly this. I followed the OpenRefine documentation to built a layer on top of Nomisma's Solr index. The API takes JSON posted from a client (OpenRefine), converts that JSON into a Lucene query that is submitted to Solr, and then serializes the XML results from Solr back into the required JSON model.

azaroth42 commented 4 years ago

There is (of course) a lot of water under this particular bridge in the information system world.

Server defined: OAI-PMH has a "simple dc" record and then the full record. Z39.50 has a "brief" record and a full record. SRU has an arbitrary list of recordSchemas, and 1.1 even allowed dynamic XPath expressions to be evaluated before returning results.

Client defined: SPARQL construct queries let you pick which bits of data you want. Similarly, GraphQL lets you send a template to fill out. Solr result export lets you define a field list of fields to put into the results.

The trade-off is obvious -- server defined is easy for the server to implement and cache but if the reduced syntax doesn't meet the client need, then they have to retrieve the full record anyway. Conversely the client defined syntax is very computationally expensive, especially with graph data.

The easiest, most web-friendly way to do it, in my opinion, is simply to have another URL that gives a server-defined subset of the data. Then when you're computing your full record, you strip it down and cache to the "brief" URL. Then have two indexes (using whatever technology), one for the full records and one for the brief. You can't do a search on the full dataset fields and then retrieve the brief form ... but you save a lot of implementation time.

We could try to define a subset of fields that are important... but we might find different fields important based on our UI expectations. I would think we could agree on:

That might be it, and might be enough?

beaudet commented 4 years ago

I think there will be a lot of challenges with adoption of linked art in the developer community if it doesn't provide a standard web friendly mechanism for querying an institution's graph while fulfilling the vast majority of use cases. Not having a query standard would also prevent open source software from being developed that might otherwise be born as linked.art standards start to take off.

It might be possible to use a simple syntax for querying that can be chained to provide complexity / flexibility... or we could adopt an existing web-friendly standard with some recipes for implementation, e.g. SOLR, GraphQL, etc. but I do think we need a solid solution for this.

beaudet commented 4 years ago

If we go the custom API route, there might be two JSON structures in play - one being the core model and the other being a small wrapper that provides pagination controls.

azaroth42 commented 4 years ago

I agree with a caveat: in the medium term we need something like that, but I don't think we have the baseline format finished yet in order to have enough understanding about what the best interaction patterns for search are. Getting implementations of just the baseline is going to be tricky enough, without expecting a search implementation as well.

Pagination - I think we can use the IIIF / Annotation / ActivityStreams pattern if we're going more custom than just "drop in here".

beaudet commented 4 years ago

That's fair.

atiro commented 3 years ago

For our new search endpoint we will be returning (per object record) (not finalised)

uniqueID 
museumNumber 
objectType 
current_location 
   id 
   displayName
   type 
   site 
   onDisplay 
_primary_title 
_primary_maker
   name
   association 
_primary_image_id
_primary_date 
_primary_place
_cultural_sensitivity_notice
_images
   _primary_thumbnail 
   _iiif_image_base_uri 
   _iiif_presentation_uri
akuckartz commented 3 years ago

Edited: Can SPARQL be provided ? I See that this was mentioned as an option. But I still do not understand what is missing.

ajs6f commented 3 years ago

Before committing to a special unique-to-LA search endpoint, or (so much the more so) requiring full SPARQL support exposed to the open web (!) perhaps we could consider and record some very concrete use cases for this facility...

@beaudet, you remarked "I think there will be a lot of challenges with adoption of linked art in the developer community if it doesn't provide a standard web friendly mechanism for querying an institution's graph while fulfilling the vast majority of use cases." Could you describe some of those specific use cases? Are you thinking of real applications under consideration for development at NGA? Elsewhere? Can you give a little detail about why harvesting data sets and indexing them locally wouldn't be feasible or wouldn't be fit for purpose?

@ewg118 Did you have plans to use a search API for your purposes or to support federation harvesting or something else?

@akuckartz Did you have particular plans for using SPARQL over distributed Linked.Art datasets? Can you say something about the kinds of queries you would want to use? Can you write a bit about why harvesting data sets and putting them in a local SPARQL implementation wouldn't be a viable choice?

ewg118 commented 3 years ago

@ajs6f Primarily harvesting of external datasets. But I'm also pro-dogfooding, so I would also build a search API on our own collection that might allow third parties to harvest our data for other projects (eg AA Collab).

Here's a use case: Harvard Art Museums has a search API that enables a Lucene query paramter to be passed to the API and responds with JSON from from ElasticSearch. The fields returned in this response are particular to the indexing scheme Harvard Art Museums implemented. But let's say they are able to create a transformation layer so that their ES response is structured into Linked Art JSON-LD.

I can build a similar transformation on our Solr index that returns LA JSON-LD.

However, my fields are different than Harvard's fields, and the Lucene queries are going to differ. So ideally, we need to come up with a standard query syntax for LA that is also transformed into a Lucene (or MySQL or SPARQL or GraphQL, depending on the underlying data store) query.

Currently, we can all probably output a standardized Linked Art compliant JSON response, but each of us is going to have a different query syntax to get that response.

azaroth42 commented 3 years ago

The intent is to not require anything heavy-weight like SPARQL, GraphQL or similar but to enable those to be built across aggregated data. This issue is less about search (which we have deferred working on until we have the representations able to be clearly published) but about whether two different representations should be published -without- search.

akuckartz commented 3 years ago

Maybe Triple Pattern Fragments are a solution. See https://linkeddatafragments.org/

beaudet commented 3 years ago

@ajs6f

Example: A decision is made to re-platform our web site using Drupal. Rather than building 100% bespoke components for our collection search, artist and art object pages, we decide instead to build those using the Linked Art data model and search APIs. As a result, we create a community around standards based museum-centric Drupal components and benefit from the work of others when they sustain and extend the components. In other words, I think the advantage of a search standard is to help us access our own data more easily by building and sharing community based software centered around standards.

azaroth42 commented 3 years ago

Proposed Baseline:

With any other properties possible as useful to local institution

azaroth42 commented 3 years ago

Document as "Summary Format" under "Representations" bullet in API documentation.

beaudet commented 3 years ago

Search / fetch and related services we're using for web sites that need art data

Locations / Places: find by various identifiers including ability to return all

Media: find by relationships with other art entities including ability to return all

Art Objects: find all, find by various identifiers, find by relationships with other entities, find by matching one or more values with one or more specific fields, scoped free text search sorting of search results

Art object fields that can be searched for value matches ARTIST_DISPLAYNAME, ARTIST_ALLNAMES, OWNER_ALLNAMES, HASLARGERIMAGERY, HASTHUMBNAIL, ONVIEW, TITLE, YEARS_SPAN, YEARS_BEGIN, MEDIUM, OBJECTID, ACCESSIONNUM, ATTRIBUTION_INV, PROVENANCE, OVERVIEW, HASOVERVIEWTEXT, CREDITLINE, LOCATION_ID, LOCATION_SITE, LOCATION_ROOM, LOCATION_UNITPOSITION, LOCATION_DESCRIPTION, DONORCONSTITUENTID, OWNERCONSTITUENTID, SUBCLASSIFICATION, TIMESPAN, THEME, SCHOOL, STYLE, NATIONALITY, CLASSIFICATION, APPLIEDTERMS, LASTMODIFICATION

Art object free text scoping ALLDATAFIELDS, CREDITLINE, MEDIUM, PROVENANCETEXT, DIMENSIONS, INSCRIPTION, MARKINGS, CATALOGRAISONNEREF, IMAGECOPYRIGHT, ARTISTS, ARTISTNATIONALITIES, OWNERS, DONORS, TERMS, BIBLIOGRAPHYTEXT, OVERVIEWTEXT, CONSERVATIONNOTES, SYSCATTEXT, EXHIBITIONHISTORY

Art object sorting options - some of these are no longer used as they were created for requirements that have been dropped YEAR_ASC, // sorts by year CLASSIFICATION_ASC, // sorts by classification alphabetically ATTRIBUTIONINV_ASC, // sorts by inverted attribution alphabetically TITLE_ASC, // sorts by title alphabetically TITLE_DESC, // sorts by title alphabetically OBJECTID_ASC, // sorts by object id numerically OBJECTID_DESC, // sorts by object id numerically HASLARGERIMAGERY_DESC, // sorts based on whether object has image restrictions HASTHUMBNAIL_DESC, // sorts based on whether a thumbnail image is available ONVIEW_DESC, // sorts based on whether object is on view or not NUMARTISTS_ASC, // sorts based on number of artist associations in ascending order // sorts based on whether the inverted attribution exactly matches the preferred display name of an associated artist ATTRIBUTIONINV_ARTISTNAME_MATCH_ASC,
ACCESSIONNUM_ASC, // added for searches using an accession number ACCESSIONNUM_DESC, // added for searches using an accession number LASTDETECTEDMODIFICATION_ASC, LASTDETECTEDMODIFICATION_DESC, FIRST_ARTIST_ASC, FIRST_ARTIST_DESC, YEAR_MATCH, // sorts a vs. b based on matching each to the year of object c CLASSIFICATION_MATCH, // "" but for classification ATTRIBUTIONINV_MATCH, // "" but for inverted attribution TITLE_MATCH, // "" but for title ONVIEW_MATCH, // "" but for on view status OBJECTID_MATCH, // "" but for object id match (probably no use cases for this) HASLARGERIMAGERY_MATCH, // "" but for same image restriction status HASTHUMBNAIL_MATCH, // "" but for same thumbnail status ARTISTS_MATCH, // sorts a vs. b based on whether artists match that of object c NUMARTISTSINCOMMON_MATCH_DESC, // sorts a vs. b based on number of artists in common with object c NUMTHEMESINCOMMON_MATCH_DESC, // "" but for themes NUMSTYLESINCOMMON_MATCH_DESC, // "" but for styles NUMDONORSINCOMMON_MATCH_DESC, // "" but for donors NUMARTISTNATIONALITIESINCOMMON_MATCH_DESC // "" but for nationalities of artists

Constituents: find all, find by various identifiers, find by relationships with other entities, find by matching one or more values with one or more specific fields, scoped free text search, sorting of search results

Constituent fields that can be searched for value matches PREFERRED_DISPLAY_NAME, ALL_NAMES, ISINDIVIDUALARTIST, -- as opposed to a workshop or similar ISARTISTOFNGAOBJECT, -- as opposed to a non-artist or artist of a non-collection work NATIONALITY, TIMESPAN, INDEXOFARTISTS_FIRST_TWO_LETTERS_LAST_NAME, INDEXOFARTISTS_LETTER_RANGE, HASBIOGRAHPY, YEARS_SPAN, CONSTITUENT_ID, ULAN_ID

Constituent free text scoping ALLDATAFIELDS

Constituent sorting options PREFERRED_DISPLAY_NAME_MATCH, // for comparing against a target constituent for closeness HASBIOGRAPHY_MATCH, PREFERRED_DISPLAY_NAME_ASC, // more typical sorting HASBIOGRAPHY_ASC, CONSTITUENTID_ASC

Faceting:

Type-Ahead lists: return type-ahead matches for specific fields (smaller data set as opposed to all data for records)

azaroth42 commented 9 months ago

Is this needed as a core feature for 1.0? LUX experience has been that we need two representations -- a Very Small one that lets you construct a link, and the full representation. Anything in the middle is both very UI specific and class specific (we wanted different features for people compared to objects compared to events... obviously).

azaroth42 commented 5 months ago

--> 1.1