tdwg / vocab

Vocabulary Maintenance Specification Task Group + SDS + VMS
11 stars 6 forks source link

Determine whether specifying practices related to controlled vocabularies is in scope #18

Closed baskaufs closed 9 years ago

baskaufs commented 9 years ago

Controlled vocabularies are frequently specified as values for property terms in TDWG vocabularies. Example values are sometimes given in term comments, but it is unclear how controlled vocabularies should be managed or by whom. This is an important issue but it is not clear to me whether it is in scope for the Vocab TG since the controlled vocabularies are not generally part of any standard. Refer to the DwC Issue: https://github.com/tdwg/dwc/issues/99 @tdwg/dwc @pmergen

ramorrismorris commented 9 years ago

@baskaufs: Short: Yes, it is in scope, or should be. But scope of what? I hope you mean scope of the Vocabulary Maintenance Specification. Long: I \think/ that by " but it is not clear to me whether it is in scope for the Vocab TG since the controlled vocabularies are not generally part of any standard" you probably do not mean "whether it is in scope" but perhaps you mean "whether it should be part of what should be enabled by the Vocabulary Maintenance Specification (VMS). (Whether it is presently in scope is surely (hopefully?) determinable from the charter (?) of the Vocab TG.)

IMO, the VMS needs to acknowledge that some CVs have broader applicability than use in a particular standard. Those and \any/ CVs should be accompanied by documents establishing that the CV complies with the VMS or sections thereof. At the same time, the VMS should clarify how the maintainers of a CV (whether part of a standard or not) can (should? must?) specify how and why some piece of the CV is not compliant with the VMS.

baskaufs commented 9 years ago

Yes, within the scope of either the Vocabulary Maintenance Spec or the Standards Documentation Standard (or both).

If one goes by the letter of the TG charter, "controlled vocabulary" isn't mentioned. However, I've never been entirely sure what a "controlled vocabulary" means. Is a controlled vocabulary simply a fixed list of strings that are acceptable literal values for property terms? Is a controlled vocabulary a set of defined classes represented by either URIs or literals (e.g. the Dublin Core type vocabulary) and possibly described using something like SKOS?

The TDWG Key to Standard Categories contains a category called "Data Standard (DS)" that is defined as "Specifies valid values in controlled vocabularies". However, we have no extant examples of this category of standards - should the vocab task group specify the mechanism for documenting and maintaining that kind of standard? If controlled values get "locked into" a standard, does that make the list of acceptable values too hard to amend?

This topic came up in an email thread in 2012 starting with this email. However, Chuck Miller described it as opening a Pandora's box. My concern isn't so much whether we are "allowed" to address this issue based on what our charter says, but rather that we don't bite off more than we can chew. We have some precedents for maintenance of property and class terms in the form of the DwC Namespace policy. Do we have precedents for the development and documentation of controlled vocabularies that we can draw upon? Probably, but I'm not familiar with them.

mdoering commented 9 years ago

We have a very good old standard that is still in use which provides a controlled vocabulary for area names, the World geographical scheme for recording plant distributions

Also values for dwc:basisOfRecord is a controlled vocabulary already, even if its values also exist for a different purposes. See also the current discussion at https://github.com/tdwg/dwc/issues/99

Then we manage a lot of vocabularies at GBIF which are based mostly on the description/examples found in the dwc standard, but also in the ontology or elsewhere. For example values for dwc:taxonRank is a rather important and well controlled one.

It would be good to have best practices for how to create and maintain such potentially hierarchical and multi language lists. But as with the other standards we likely need to differ between a single normative reference and multiple technical implementation guides. The expected value for a dwc archive file, an XML file or an RDF representation is likely to be different.

baskaufs commented 9 years ago

After some reading and reflection on this, it seems pretty clear to me that a "controlled vocabulary" is a category of "vocabulary" that should be included in both the vocabulary maintenance and standards documentation standards. I'm going to close this issue and replace it with more specific issues.