Open richardofsussex opened 1 year ago
Also note that the Linked Art API mandates a pattern for URLs, whereby they include the entity type:
https://linked.art/example/object/1
We need to ensure that our/my script includes this information within the cross-referencing URLs it generates.
Conversation with Rob S confirms that there is no 'built-in' mechanism for rendering Linked Art as HTML. The API is designed to fit in with Linked Data practices, but the recommendations are all about returning JSON. Other formats are 'to be implemented'. If my HTML-rendering code was re-cast to operate server-side (e.g. as a PHP page) it could potentially meet this requirement (and would be of value to other Linked Art users who haven't yet addressed HTML delivery of their content).
Unless I'm missing something major, the API offers no support for resource discovery. Which simplifies our task - nothing to conform to - but begs the question of how we might ever do distributed searching across the Linked Art world.
We do not really need to worry about searching yet - for us the easiest approach might be just based on an ES search of all of the produced json documents, but then we also get into the potential for more semantic based SPARQL based queries.
I suppose my comment is more for the Linked Art group than for NG. If they have ambitions to support cross-collection searching their API will need a search element. Obviously they could just say "everyone have a SPARQL end point", but that doesn't sit well with their saying that you should just be able to put a set of static files on disc and that that can be a valid Linked Art implementation.
As regards rendering to HTML, the Getty JS library: https://linkedartjs.org/ will be orders of magnitude better than what I have done, and could usefully be investigated.
Worth a look once the mappings are all happy :-)
Just implemented a first pass at including 'endpoint paths' within the NG cross-referencing URLs. The template which does this job can take a specific path as a parameter; otherwise it will include whatever precedes '-' in the @admin.id field. Our entity types don't all match Linked Art's - object, concept, event and place are four which do. They have 'group' and 'person' where we have 'agent', 'set' where we have 'package' and 'text' where we have 'publication'.
I'm minded to use their terminology where possible, because in the API this path element is used to infer the entity type. We can always do a URL rewrite to help resolve the URL - although I would have thought UUIDs (and UIDs when we get them) would be unique anyway. However, I don't see how this can be achieved for group/person vs. agent, since 'agent-6672' could be either a person or a group.
What's your advice?
If we are mapping to Linked.Art we should use the terminology they use and only deviate from this if what we want to publish goes beyond their existing templates.
In the long run terms like "agent-6672" should disappear as they will not be required - an entity will be resolved via is PID and will have a type of "group" or "person" etc.
For this type of mapping we do not need any "internal" IDs as they could all be discovered, if really needed, by connecting to the raw CIIM output via the PID. They only need to be included here if there are some good use-cases, if not then they are just noise really.
But this needs to be an agreed position ....
The problem that remains is that when an agent is mentioned in an object record, all we have is their id/uid/uuid. The only clue as to whether this is an individual or an organisation is the first part of the id. Since this is 'agent-' for both, this means that we need to dereference their URL to find out whether they are an Individual or an Organisation. And the reason we want to do this is so we can follow the Linked Art guidelines on the form that URLs should take. On the face of it, this is a Catch 22 situation - how can we know whether it should be https://data.ng.ac.uk/person/XXX or https://data.ng.ac.uk/group/XXX that we should dereference? One answer would be to have an alternative form of URL which doesn't include the entity type.
Anyway, this issue neatly exemplifies two of the reasons why I think Jolt isn't up to the job: the need to dereference linked entities, and the need to test on data values and then output different/computed data values.
Yes, we should only be using UIDs = PIDs as our identifiers. But we deliberately made sure that these were agnostic as to entity type - there is no entity type encoded within them, neither is it envisaged for our PID resolution framework - including it just adds another level of complexity.
Agreed that we have to include entity types in LA records, they should match the agreed LA entity types. persons and groups can be distinguished by the content of (IIRC) @datatype.actual.
Unlike you, the Linked Art API guidelines for URIs (https://linked.art/api/1.0/protocol/ - Preferred URI structure) mandate the inclusion of an indication of the entity type as the "endpoint" within the URI path. They give a set of preferred values for this endpoint, including "person" and "group". We can ignore this guidance and have URIs which follow your approach, or we have to address the issues I outline above. Your choice.
Mapping to the approved entity types within a Linked Art agent record is indeed straightforward (unless you use Jolt), but that's not the issue here.
I think this again is an issue of data preparation - the jolt process, ideally should just be about rearranging data to form the required LA structures and relationships - if more complex data transformations are required and potentially calls back to raw data etc are required then this work should be done at source.
We have discussed presenting our data in various forms, LA being the more complex, each current and future data presentation will require its own Jolt process - we do not want to be repeating complex data transformation in each one as this will increase future maintenance and sustainability issues.
So I would suggest all complex data transformation, format shifts or augmentations (adding data from other entities - labels, types, etc) should happen within the CIIM and augment the "raw NG record" - which is then pushed towards any Jolt process.
As above, the question of entity types in PID paths is now marked down as for discussion with K-Int: https://github.com/national-gallery/NG-CIIM/issues/22#issuecomment-1665898540
I am finding the Linked Art API descriptions very useful as a means of visualizing the target structure we should be aiming for. While we are not ourselves delivering data through this API, I think that we should be aiming to produce an output which, as far as possible, has the same structure as data which is delivered in this way. Here, for example, is the page about physical objects:
https://linked.art/api/1.0/endpoint/physical_object/
I think I'll do a top-down exercise, and produce a mapping which shows where I think each of the objects/arrays immediately within _source in the CIIM output should end up within this framework. That will immediately show where the gaps and question marks are.