tdwg / dwc

Darwin Core standard for sharing of information about biological diversity.
https://dwc.tdwg.org
Creative Commons Attribution 4.0 International
203 stars 70 forks source link

Create controlled vocabulary for proposed revised term dwc:establishmentMeans #258

Closed baskaufs closed 3 years ago

baskaufs commented 4 years ago

Submitter

Steve Baskauf

Summary:

In Issue 235, changes to the definition of dwc:establishmentMeans were proposed. This proposed vocabulary is intended to be used as the values for that revised term and it should be considered together with that proposal.

Proponents and justification:

The proponents are the same as in the dwc:establishmentMeans proposal. The terms in this vocabulary were generated from the Table 1 in Groom et al. (2019) Improving Darwin Core for research and management of alien species https://doi.org/10.3897/biss.3.38084, which was based on the vocabularies used by GBIF and the International Union for Conservation of Nature (IUCN) to express whether a species is native or alien. Please refer to the BISS article for a complete justification.

Details of the proposal

The vocabulary terms and their metadata are provided in this proposed list of terms document. Each term has a full term IRI, which must be used as a value for dwciri:establishmentMeans and a controlled value string, which must be used a value for dwc:establishmentMeans.

This proposal creates a new vocabulary within the Darwin Core standard. According to the IRI patterns used in the http://rs.tdwg.org/ subdomain, the first level IRI path segment must differ from that of the main Darwin Core vocabulary (http://rs.tdwg.org/dwc/). The proposed first level IRI path segment for this vocabulary is dwcem, making the IRI for the vocabulary http://rs.tdwg.org/dwcem/. The proposed namespace IRI for terms in the vocabulary is http://rs.tdwg.org/dwcem/values/. The proposed preferred namespace abbreviation matches the first level IRI path segment: dcwem:.

In accordance with Section 4.1.2 of the Standards Documentation Specification, the class of all controlled value terms is skos:Concept. Following typical SKOS practice, the concepts are also grouped into a concept scheme.

Term details

To review the complete list of terms, view the list of terms document. To view the metadata in tabular form, view or download this CSV file.

peterdesmet commented 4 years ago

Question: why do the values start with e, like e001?

Note: the csv link refers to the md document.

baskaufs commented 4 years ago

Sorry about the bad link - it's fixed now.

There are two practical reasons why the local name starts with a letter.

First, in the interest of simplifying maintenance of vocabularies, the source for generating all other forms of metadata is a CSV file. Without the starting letter, it would be a continual struggle to keep spreadsheet editors from dropping the leading zeros in any columns that contain local names (e.g. skos_broader). Of course, on could just have the single digit "1" instead "001", but then sorting would create orders like 1, 11, 12, 13, ... , 19, 2, 20, 21, ... etc. which is annoying.

The other reason is that some tools for dealing with IRI namespace/local name combinations follow XML rules, which do not allow element names to begin with numeric characters. The rules for CURIEs are more lax, but I don't see any reason to create potential implementation issues when they could be avoided by better design of the identifiers.

In the end, the identifiers are intended to be opaque, so their form isn't really that important other than having a pattern that guarantees that they are unique. It's similar to the IRI pattern followed by OBO foundries, but simpler.

ansell commented 3 years ago

@baskaufs Is the "Controlled value" as permanent as the IRI? It will be very difficult to encourage data providers to put e001 (or the IRI version) into their raw datasets, but if we had a good, permanent, non-IRI label then we have a better chance of convincing them to use this.

baskaufs commented 3 years ago

Yes, the controlled value should be unchangeable. Well, as unchangeable as anything else that's normative - we've changed IRIs before and that is an equally bad thing.

I avoid using the term "label" though. I try to use the term "controlled value string" (listed as "controlled value" in the term list document). The label is mutable and should be created in many languages. But the label is not what people would put in datasets - that would be the controlled value string.

In the DwC controlled vocabulary proposals, there would be separate terms to use with the strings and the IRIs. The value of dwc:establishmentMeans would always be the controlled value string. The value of dwciri:establishmentMeans would always be the IRI. Since use of the dwciri: terms is relatively uncommon, that means that most providers will generally only be using the controlled value strings for the dwc: versions of the terms.

baskaufs commented 3 years ago

There is a typographical error in the definition of dwcem:e004. Instead of "... specifically with the attention of creating ...", it should say "... specifically with the intention of creating ...". Corrected in https://github.com/tdwg/rs.tdwg.org/commit/2acfcdf7688a821637aad1519308e135f68079bd

tucotuco commented 3 years ago

This proposal has passed public commentary and has been submitted for review by the TDWG Executive Committee.

baskaufs commented 3 years ago

Approved in Executive decision 24. Incorporated in rs.tdwg.org release 2020-10-13.