Generic types vs specific types in the RDM

ncarboni commented 4 years ago

From @VladimirAlexiev on twitter:

Name parts are not classified (eg given, surname..) Should we use AAT for classification ?

ncarboni commented 4 years ago

IMHO we should not prescribe specific type. The purpose is to leave the user to select their own with respect the diverse cultural/language interpretations. We can suggest possible types to be used (where and how however need to be decided). I do have a list of them already prepared (used in SARI).

However I do think we should avoid presenting this as the "way to go" in order to not restrict the uses.

VladimirAlexiev commented 4 years ago

(This is not limited to name parts, but all kinds of vocabulary values). It will benefit everyone to standardize not just structure (eg your CRM profile) but also values. The Getty has been very perceptive to add to AAT, or you can use Wikidata items.

Eg the AAC model initially had museum-specific values for gender. This was turned around completely in linked.art, which is strong on standardizing values.

ncarboni commented 4 years ago

I totally agree that should be a global issue.

Prescription would be IMHO too much for RDM and its aim, but maybe we can release a list of types as suggestions and example. Or should we do as in LinkedArt and define at least one type and leave the use of more types? It definitively would be useful at query and aggregation level.

We can even maybe do it for major sections?

@Habennin what's your take?

Habennin commented 4 years ago

Yes, depending on the values to be standardised, it is good/necessary to specify a mandatory or recommended vocabulary. Obviously a semantic data structure is made more usable if the data values are also standardised.

That being said, since the SARI models are up-to-now offered in large part as a helping hand to provide recipes on how to model common patterns of data relative to oft-documented entities, specifying a standardised way of semantically representing different propositions using CIDOC CRM, advising what vocabularies to use has not been a focus.

Linked.Art aims to create a specific community creating data around their data profile. the SARI models, for now, play a more limited function. On the other hand, this leaves them more open to a wider audience since it is not prescriptive where it cannot be prescriptive.

The example of names is highly relevant. It is not possible to force a name part structure over the data if you want to be maximally inclusive because only a small part of modern society uses the canonical Western way of partitioning names.

CHIN, for example, aims to properly represent indigenous heritage via respectful dialogue with that community. Therefore, they cannot and should not specify that names must come in the form of 'first', 'second' and 'last'.

Within a narrower scope (like Linked.Art) or any other specific data integration where partner interested and data scope are known, I definitely agree that specifying the particular vocabulary would be good. In the context of SARI we may want to indicate not what ought to be used, but to provide a place to document what is used. Then one could adopt vocabulary matching tools in order to create cross walks across different vocabularies that are legitimately adopted by different communities. Gender is another example of where it would be incorrect for SARI to dictate what terms may be used.

swiss-art-research-net / reference-data-models

Generic types vs specific types in the RDM #2