Closed azaroth42 closed 5 days ago
Discussion during early Linked Art calls was to keep model and vocabulary separate, however as the intent is to move towards a stable 1.0 version, then we must deal with the issues of backwards compatibility.
Notably, if the scope of the vocabulary entries are clarified and its usage no longer works as expected, then the best practice would be to update to use the latest revision. However from a software stability perspective, this would be a breaking change as the entry no longer identifies what it used to, when the software was written. Given that the API is directly predicated on the model, this change to a best practice would result in incompatibility of two systems that otherwise implement the same model.
Thus, I feel strongly that the basic vocabulary entries should listed within the model, allowing them to be versioned appropriately. Then "1.0 compliant" software will use the term that was specified for 1.0. In a later release (say, 1.1) the vocabulary entry can be updated to use the more appropriate term. By "basic" I mean the following:
has_preferred_identifier
). This is not a hypothetical concern. Recently by discussion with myself, our local Provenance experts, and experts from MoMA, the AAT concept for auction was clarified by the editors to be the overall auction event, not a single sale of a lot. A new entry was added for the sale of a lot by auction, replacing the old term that was not clear as to which it was. This is even worse than a simple change, as the term used has a _different_meaning. This would cause chaos between systems that interpret the term one way and those that interpret it the other. You might say "don't do that then" ... but that is not under our control. All we can do is specify the core that we commit to sticking to during a particular version of the model and API.
[That was my TED talk, thanks for listening]
I am a bit struggling to understand the ramifications. Maybe a first clarification that could be useful is to identify which part of the "vocabulary" we're talking about, when we discuss all these risks. Are we talking about the part about classes and properties (the CIDOC-CRM level) or the concepts (the SKOS/AAT level)? I start to feel it may be useful to start making a distinction, if just for managing changes that could come from very different sources and concerns.
Vocabulary as in the concepts, rather than the ontological classes from CIDOC-CRM. The SKOS/AAT level :)
OK this clarifies, and indeed it would make sense to "cache" the semantics that we need for a given version of Linked Art. This said: isn't there already a versioning mechanism for the AAT concepts that would alleviate this? Something like version numbers, or a policy to never change too much the semantics of a concept but rather deprecate it in favour of a new one if a drastic change is made. Hmm...
Would we want these to be within a different namespace? I feel like we should either commit to AAT, warts and all, or define our own vocabulary (with all the maintenance headaches of that).
I was thinking about this the other day in the context of our own custom vocabularies. I'm wondering if, for linked art, it would make sense to create linked-art specific collections of AAT (and other) terms whose membership is curated by the linked.art community. Thus, for capturing terms for commonly recurring patterns such as "ethnicity", "nationality", "types of texts", etc. if we find that the AAT makes it difficult to find these terms or they are split up all over the place or another vocabulary does a better job of describing them, we can officially sanction their use. This would be of great assistance, I think, to an organization like ours that has its own custom vocabulary that still needs to be bridged to a broadly accepted one as it would save a great deal of time searching for the appropriate terms in some cases. Having curated collections of terms as part of linked.art would possibly open up the door to bridging higher level terms across vocabularies.
What @beaudet say. In particular:
an organization like ours that has its own custom vocabulary that still needs to be bridged to a broadly accepted one
points to a migration problem that we can make easier or harder by being less or more restrictive with the ranges of properties. Goodness knows that SI has more different vocabularies (and different eras of usage for the same vocabulary) in use than you could shake a stick at, some shared, many custom, and if moving entirely to AAT is a prerequisite to create Linked.Art data, that's an instant high barrier for some.
I also agree! We should document the commonly accepted terms from the community, and leave it open for local extensions. We do not want to take on vocabulary management, just community consensus around existing vocabularies where that consensus provides significant value. We can fine tune which concepts are in and which are local as we go :)
Consensus at London F2F:
Will come up with lists in a subgroup (Rob, Sami, David B, Bree, Nicola, John) to propose back to larger group.
In order to try and move this discussion forward, I think the best way is to discuss how to make decisions about the ontology terms, and then we can simply apply that process.
So, here, a proposed rubric for making decisions about the categorization:
Required terms would be necessary to use for basic compatibility. So if you were to choose another concept to use in place of one of these, you would be outside of Linked Art. You would expect a system to be in an error state when these terms are not used.
Metatypes – These have to be required in order to use as flags for the type that they are classifying. If you don’t know what “type of object” means, then you won’t be able to do anything with the unenumerable data below it. Replacements for Ontology – Instead of standardizing ontology features, if we’re deferring out to the vocabulary layer, then these need to be required. For example “Primary Name” isn’t up for debate, as it replaces a critical notion of … well … primary name. Trivially enumerable sets – We should be clear about sets that we can enumerate completely (or so completely as to not raise eyebrows that we missed something). For example the types of Dimension might not be absolutely everything that anyone might ever need, but it’s doubtful that in the scope of the work, there are dimensions that we don’t know about. There might not be any of these. API distinguishing terms – Terms that would distinguish between responses from the API that might otherwise be indistinguishable. For example, if there is an API call to return “Exhibition Activity” and an API call to return “Provenance Activity” … then at the ontology layer they’re both just E7 Activity. We should require a flag to make this trivial to distinguish.
These terms don't have to be used, but there should be a good reason for not using them. Systems might be built that recognize these terms and do something special, but they should not fail when they are not encountered.
Enumerable Sets – Sets of terms that we can enumerate, but might be extended in some circumstances. E.g. parts of Personal Names … having everyone have to dig around for “first name” is a waste of everyone’s time when we can just recommend the AAT term. Ultra-Common Terms – Instead of having people dig around for very common terms, just put together a list of recommended ones. They’re free to not use them if they don’t want to … but we should make consistency easy, and inconsistency possible but a high price.
These terms are presented (separately) to reduce the work of others to find and select them, but there should be no ill effects if implementations use different terms than the ones listed.
Other - Further terms that come up frequently, but are not so common as to be recommended.
(Reopening as there's work to do, even if agreement on the rubric)
@azaroth42 sorry about that!
Related to #237 as well.
link to existing vocabularies that we have collected: https://docs.google.com/spreadsheets/d/1gyxb9Q31jvF0Zd_BZscqijA20u99FRgCJgDGmRMbPe0/edit#gid=1337687428
Perhaps another way to distinguish the three levels:
Suggestions welcome for how to deal with examples in the model documentation that rely on classifications that might fall into the Recommended layer, rather than required?
For example: https://linked.art/model/actor/#parts-of-names
If we bake these examples into the model text, then we would be (essentially) requiring them or at the very least agreeing implicitly not to change them until the next major version of the model.
Perhaps the table can live in the vocabulary section, and we can make the examples be non-normative when it comes to the particular classifications used?
Could you please remind us what the homework was for this issue?
I looked on the notes for the last session and didn't see anything ...so I second Dave on the need for a reminder on the homework.
Sorry! If we're happy with the required page in the spreadsheet, we should come to some agreement about the recommended page. E.g. Are there entries that systems would fail to behave as expected if they weren't consistent (and should be in required), are there entries that are not very useful at all (and should be in the "listed" set), or are there missing entries that systems could usefully process to make the UX better (and should be in the recommended list)
Linked Art WG Yale Face-to-face Meeting Decision:
equivalent
to other URIs in the Reference pattern, eg{
id: http://vocab.get.edu/aat:123423423,
type:Type,
_label:”thing”,
equivalent”:
[
{‘id”: local-uri,
type:Type,
_label: local thing}
]
}
Docs should record best practices around bindings.
Coming back to the example in the model docs about paintings and artworks, it could use an update now that things like "collection items" are in the vocabulary, maybe along with "archives"?
Versioning of the lists:
First step of extracting vocabulary from python to json: https://github.com/linked-art/crom/pull/52
Not going to try and build the documentation from the JSON, as the categorization and ordering is hard, plus there's utility in having different layouts for required (new, important to get right, should have examples), vs recommended (a lot, but worth describing) and optional (potentially many many, just a name and link is sufficient).
https://linked.art/model/vocab/
I think this is done enough. Proposals for more listed and recommended terms welcome in new issues.
Currently the vocabulary terms are mostly extracted from the model and managed in a separate page completely outside of the model work. The original intent was to enable the same patterns to be implemented using different choices of vocabulary.
This has proven to be an impediment to implementation, as the abstraction is more complicated to describe and to understand than a definition as to what to use in which circumstances. For interoperability, the more specific we can be, the more likely software will be successful at consuming data that was produced in a different environment. This is especially true with CIDOC-CRM, which relies heavily on external vocabularies via P2 to describe specifics which would normally be part of the core ontology.
Thus, the vocabulary pages (https://linked.art/community/best-practices/vocabularies/) should be rolled into the relevant /model/ pages as normative requirements, rather than as external best practices.