Make model more concrete with vocabulary

azaroth42 commented 5 years ago

Currently the vocabulary terms are mostly extracted from the model and managed in a separate page completely outside of the model work. The original intent was to enable the same patterns to be implemented using different choices of vocabulary.

This has proven to be an impediment to implementation, as the abstraction is more complicated to describe and to understand than a definition as to what to use in which circumstances. For interoperability, the more specific we can be, the more likely software will be successful at consuming data that was produced in a different environment. This is especially true with CIDOC-CRM, which relies heavily on external vocabularies via P2 to describe specifics which would normally be part of the core ontology.

Thus, the vocabulary pages (https://linked.art/community/best-practices/vocabularies/) should be rolled into the relevant /model/ pages as normative requirements, rather than as external best practices.

azaroth42 commented 5 years ago

Discussion during early Linked Art calls was to keep model and vocabulary separate, however as the intent is to move towards a stable 1.0 version, then we must deal with the issues of backwards compatibility.

Notably, if the scope of the vocabulary entries are clarified and its usage no longer works as expected, then the best practice would be to update to use the latest revision. However from a software stability perspective, this would be a breaking change as the entry no longer identifies what it used to, when the software was written. Given that the API is directly predicated on the model, this change to a best practice would result in incompatibility of two systems that otherwise implement the same model.

Thus, I feel strongly that the basic vocabulary entries should listed within the model, allowing them to be versioned appropriately. Then "1.0 compliant" software will use the term that was specified for 1.0. In a later release (say, 1.1) the vocabulary entry can be updated to use the more appropriate term. By "basic" I mean the following:

Any term that is defined for the clarification of the model itself, to avoid a subproperty. For example, "primary term" is used to avoid a host of preference based properties (such as has_preferred_identifier).
Any term that is practically ubiquitous to the domain. For example to distinguish "sculpture" from "painting". Conversely, "red figure attic vase" is very specialized.
Basic languages, materials, currencies and units, as subclasses of Type.

This is not a hypothetical concern. Recently by discussion with myself, our local Provenance experts, and experts from MoMA, the AAT concept for auction was clarified by the editors to be the overall auction event, not a single sale of a lot. A new entry was added for the sale of a lot by auction, replacing the old term that was not clear as to which it was. This is even worse than a simple change, as the term used has a _different_meaning. This would cause chaos between systems that interpret the term one way and those that interpret it the other. You might say "don't do that then" ... but that is not under our control. All we can do is specify the core that we commit to sticking to during a particular version of the model and API.

[That was my TED talk, thanks for listening]

aisaac commented 5 years ago

I am a bit struggling to understand the ramifications. Maybe a first clarification that could be useful is to identify which part of the "vocabulary" we're talking about, when we discuss all these risks. Are we talking about the part about classes and properties (the CIDOC-CRM level) or the concepts (the SKOS/AAT level)? I start to feel it may be useful to start making a distinction, if just for managing changes that could come from very different sources and concerns.

azaroth42 commented 5 years ago

Vocabulary as in the concepts, rather than the ontological classes from CIDOC-CRM. The SKOS/AAT level :)

aisaac commented 5 years ago

OK this clarifies, and indeed it would make sense to "cache" the semantics that we need for a given version of Linked Art. This said: isn't there already a versioning mechanism for the AAT concepts that would alleviate this? Something like version numbers, or a policy to never change too much the semantics of a concept but rather deprecate it in favour of a new one if a drastic change is made. Hmm...

workergnome commented 5 years ago

Would we want these to be within a different namespace? I feel like we should either commit to AAT, warts and all, or define our own vocabulary (with all the maintenance headaches of that).

beaudet commented 5 years ago

I was thinking about this the other day in the context of our own custom vocabularies. I'm wondering if, for linked art, it would make sense to create linked-art specific collections of AAT (and other) terms whose membership is curated by the linked.art community. Thus, for capturing terms for commonly recurring patterns such as "ethnicity", "nationality", "types of texts", etc. if we find that the AAT makes it difficult to find these terms or they are split up all over the place or another vocabulary does a better job of describing them, we can officially sanction their use. This would be of great assistance, I think, to an organization like ours that has its own custom vocabulary that still needs to be bridged to a broadly accepted one as it would save a great deal of time searching for the appropriate terms in some cases. Having curated collections of terms as part of linked.art would possibly open up the door to bridging higher level terms across vocabularies.

ajs6f commented 5 years ago

What @beaudet say. In particular:

an organization like ours that has its own custom vocabulary that still needs to be bridged to a broadly accepted one

points to a migration problem that we can make easier or harder by being less or more restrictive with the ranges of properties. Goodness knows that SI has more different vocabularies (and different eras of usage for the same vocabulary) in use than you could shake a stick at, some shared, many custom, and if moving entirely to AAT is a prerequisite to create Linked.Art data, that's an instant high barrier for some.

azaroth42 commented 5 years ago

I also agree! We should document the commonly accepted terms from the community, and leave it open for local extensions. We do not want to take on vocabulary management, just community consensus around existing vocabularies where that consensus provides significant value. We can fine tune which concepts are in and which are local as we go :)

azaroth42 commented 4 years ago

Consensus at London F2F:

Can always have other Types associated with the resource.
Some can be: Must have at least the appropriate one from this list, if there's an appropriate entry
Some can be: Should have ...

Will come up with lists in a subgroup (Rob, Sami, David B, Bree, Nicola, John) to propose back to larger group.

azaroth42 commented 4 years ago

In order to try and move this discussion forward, I think the best way is to discuss how to make decisions about the ontology terms, and then we can simply apply that process.

So, here, a proposed rubric for making decisions about the categorization:

Required

Required terms would be necessary to use for basic compatibility. So if you were to choose another concept to use in place of one of these, you would be outside of Linked Art. You would expect a system to be in an error state when these terms are not used.

Metatypes – These have to be required in order to use as flags for the type that they are classifying. If you don’t know what “type of object” means, then you won’t be able to do anything with the unenumerable data below it. Replacements for Ontology – Instead of standardizing ontology features, if we’re deferring out to the vocabulary layer, then these need to be required. For example “Primary Name” isn’t up for debate, as it replaces a critical notion of … well … primary name. Trivially enumerable sets – We should be clear about sets that we can enumerate completely (or so completely as to not raise eyebrows that we missed something). For example the types of Dimension might not be absolutely everything that anyone might ever need, but it’s doubtful that in the scope of the work, there are dimensions that we don’t know about. There might not be any of these. API distinguishing terms – Terms that would distinguish between responses from the API that might otherwise be indistinguishable. For example, if there is an API call to return “Exhibition Activity” and an API call to return “Provenance Activity” … then at the ontology layer they’re both just E7 Activity. We should require a flag to make this trivial to distinguish.

Enumerable Sets – Sets of terms that we can enumerate, but might be extended in some circumstances. E.g. parts of Personal Names … having everyone have to dig around for “first name” is a waste of everyone’s time when we can just recommend the AAT term. Ultra-Common Terms – Instead of having people dig around for very common terms, just put together a list of recommended ones. They’re free to not use them if they don’t want to … but we should make consistency easy, and inconsistency possible but a high price.

Listed

These terms are presented (separately) to reduce the work of others to find and select them, but there should be no ill effects if implementations use different terms than the ones listed.

Other - Further terms that come up frequently, but are not so common as to be recommended.

azaroth42 commented 4 years ago

(Reopening as there's work to do, even if agreement on the rubric)

karinanw commented 4 years ago

@azaroth42 sorry about that!

azaroth42 commented 4 years ago

Related to #237 as well.

beaudet commented 3 years ago

link to existing vocabularies that we have collected: https://docs.google.com/spreadsheets/d/1gyxb9Q31jvF0Zd_BZscqijA20u99FRgCJgDGmRMbPe0/edit#gid=1337687428

azaroth42 commented 3 years ago

Perhaps another way to distinguish the three levels:

Required: You have to use these or consuming applications will do the wrong thing. Most useful to software engineers writing clients and transformation pipelines, and people doing mappings of data / database structures.
Recommended: Classifications that could reasonably be expected to be produced during a data transformation, either by looking at fields in the incoming data and hard coding, or looking at values in the incoming data and getting the value from a lookup table. Most useful to people doing mappings of data / database structures.
Listed: Anything else that might be useful, but probably will need to be recorded directly in the data management system, over which we probably have no control or resources to change. Informative list of useful identifiers, most useful to people describing objects in a compatible system and admins of such a system.

azaroth42 commented 3 years ago

Suggestions welcome for how to deal with examples in the model documentation that rely on classifications that might fall into the Recommended layer, rather than required?

For example: https://linked.art/model/actor/#parts-of-names

If we bake these examples into the model text, then we would be (essentially) requiring them or at the very least agreeing implicitly not to change them until the next major version of the model.

Perhaps the table can live in the vocabulary section, and we can make the examples be non-normative when it comes to the particular classifications used?

beaudet commented 3 years ago

Could you please remind us what the homework was for this issue?

BreeAnn7 commented 3 years ago

I looked on the notes for the last session and didn't see anything ...so I second Dave on the need for a reminder on the homework.

azaroth42 commented 3 years ago

Sorry! If we're happy with the required page in the spreadsheet, we should come to some agreement about the recommended page. E.g. Are there entries that systems would fail to behave as expected if they weren't consistent (and should be in required), are there entries that are not very useful at all (and should be in the "listed" set), or are there missing entries that systems could usefully process to make the UX better (and should be in the recommended list)

julsraemy commented 1 year ago

Linked Art WG Yale Face-to-face Meeting Decision:

Use AAT URIs as currently
Allow equivalent to other URIs in the Reference pattern, eg

{
id: http://vocab.get.edu/aat:123423423, 
type:Type, 
_label:”thing”, 
equivalent”:
 [
{‘id”: local-uri, 
type:Type, 
_label: local thing}
]
}

Docs should record best practices around bindings.

cbutcosk commented 1 year ago

Coming back to the example in the model docs about paintings and artworks, it could use an update now that things like "collection items" are in the vocabulary, maybe along with "archives"?

azaroth42 commented 1 month ago

Versioning of the lists:

Required -- these are semantically versioned -- a change is a new major version to change, minor version to add
Recommended -- Somewhat semantically versioned -- new minor to change (as not fully breaking), patch to add
Listed -- Not semantically versioned at all

azaroth42 commented 3 weeks ago

First step of extracting vocabulary from python to json: https://github.com/linked-art/crom/pull/52

azaroth42 commented 3 weeks ago

Not going to try and build the documentation from the JSON, as the categorization and ordering is hard, plus there's utility in having different layouts for required (new, important to get right, should have examples), vs recommended (a lot, but worth describing) and optional (potentially many many, just a name and link is sufficient).

azaroth42 commented 5 days ago

https://linked.art/model/vocab/

I think this is done enough. Proposals for more listed and recommended terms welcome in new issues.

linked-art / linked.art

Make model more concrete with vocabulary #186

Required

Recommended

Listed