tdwg / dwc

Darwin Core standard for sharing of information about biological diversity.
https://dwc.tdwg.org
Creative Commons Attribution 4.0 International
205 stars 70 forks source link

Implement TDWG-managed version of Dublin Core borrowed terms #371

Open baskaufs opened 3 years ago

baskaufs commented 3 years ago

This will eliminate manual editing of Dublin Core data that currently must be done in the rs.tdwg.org repo. See step 10 and 11 in the build workflow

from 2021-08-04 offline email:

Well, here's what I'm thinking. We already mint IRIs for "term lists". In the case of TDWG-issued terms, the term list IRI is the same as the namespace IRI. See https://github.com/tdwg/rs.tdwg.org/blob/master/term-lists/term-lists.csv for the full list.

For borrowed terms, there's an IRI for the term list in this form:

http://rs.tdwg.org/dwc/dc/    dc: terms borrowed by DwC
http://rs.tdwg.org/dwc/dcterms/ dcterms: terms borrowed by DwC
http://rs.tdwg.org/ac/dcterms/  dcterms: terms borrowed by AC
http://rs.tdwg.org/ac/xmpRights/ XMP terms borrowed by AC
http://rs.tdwg.org/ac/dwc/ DwC terms borrowed by AC

The first level subpath is the vocabulary that the term list is part of. The second level subpath represents the borrowed namespace and is usually the standard abbreviation used for it. So if one knows the pattern, one can know both the containing vocabulary and the borrowed namespace.

What I would do is just follow the same pattern that we use for TDWG-minted terms: https://github.com/tdwg/rs.tdwg.org/blob/master/README.md#patterns-versions

Here is how I'd denote the July 15, 2021 version of dcterms:references:

http://rs.tdwg.org/dwc/dcterms/version/references-2021-07-15

The assertion we'd make that is questionable is:

<http://rs.tdwg.org/dwc/dcterms/version/references-2021-07-15> dcterms:isVersionOf <http://purl.org/dc/terms/references>.

while a legitimate statement would be:

<http://dublincore.org/usage/terms/history/#references-003> dcterms:isVersionOf <http://purl.org/dc/terms/references>.

with the subject being the DCMI-minted version.

I'm not sure how illegitimate the questionable assertion actually is. I guess it depends on what you believe a "version" is. TDWG defines its own version model, so I guess we can just say that a version is whatever we want it to be. In any case, because this follows the same IRI patterns as for TDWG-minted terms, I think that all of the dereferencing would just work and the scripts would need only a minor tweak to make it work. There may be other ways to handle this but it's the first idea I've had that I think actually would work.

dr-shorthair commented 3 years ago

The semantics of 'version' is very variable. It is certainly used for a lot more than 'revisions', which was my earlier mistaken assumption. RDA had a whole Data Versioning working group (final report here https://doi.org/10.15497/RDA00042), who could/should have developed a taxonomy of version types ... or maybe not, since it would shade into a taxonomy of relationships (for which we have https://www.iana.org/assignments/link-relations and https://standards.iso.org/iso/19115/resources/Codelists/gml/DS_AssociationTypeCode.xml and https://id.loc.gov/vocabulary/relators.html already).

I guess 'version' is a subset of relations, being those that relate two endurant entities which have the same scope and similar intention and content.

baskaufs commented 3 years ago

Thanks for the thoughts and references here, @dr-shorthair