tdwg / vocab

Vocabulary Maintenance Specification Task Group + SDS + VMS
11 stars 6 forks source link

Describe how controlled vocabularies should be documented #20

Closed baskaufs closed 8 years ago

baskaufs commented 9 years ago

See closed Issue #18 . We need to describe how controlled vocabularies should be documented as strings and URIs. String versions of controlled vocabulary terms would be consumed as they are currently: as text values of terms intended for use with literals, present in CSV tables or as RDF literals. URI versions of controlled vocabulary terms would be described according to best practices followed elsewhere, i.e. probably using SKOS terms in their definitions. This would allow for richer, machine-interpretable definitions of the relationships among various controlled values.

baskaufs commented 9 years ago

Refer specifically to the comment https://github.com/tdwg/vocab/issues/18#issuecomment-118790437 which contains examples and use cases

ramorrismorris commented 9 years ago

This may be too picky and/or there may be nothing that can be done about it: in CSV tables, every(?) value is a string. There may even be strings intended to be URIs, including those with defining documents elsewhere. So it may be somehow document the relation, if any between such uses of the sequences of bytes found in a CSV table or the like. Maybe its as simple as a DwC-A kind of approach that includes a way for consumers to locate a metadata file that specifies any additional semantics of the strings.

baskaufs commented 9 years ago

Note on this issue: See Best practice in formalizing a SKOS vocabulary for Best Practice suggestions with SKOS examples.

baskaufs commented 9 years ago

At the 2015-07-15 TG meeting Terry said he's give a shot at describing a controlled vocabulary using SKOS. Target date by the end of July (I think).

baskaufs commented 9 years ago

I have managed to borrow copies of ISO 25964-1 and -2 "Information and documentation -- Thesauri and interoperability with other vocabularies" via Interlibrary Loan. I found the definitions of terms and some sections of Part 2 to be very enlightening with respect to understanding the meaning of controlled vocabularies and thesauri, and how they are related to ontologies. I have included quotes that I think are relevant to the work of our TG on the page https://github.com/tdwg/vocab/blob/master/iso25964.md .

The takehome message I got was that controlled vocabularies and thesauri are focused on human-readable terms whose purpose is to allow human users to select standardized terms that represent concepts that help humans to search or browse collections, whereas ontologies are focused on describing classes and individuals for the purpose of enabling machine reasoning. These two approaches can be complementary in the case of a "metadata schema" in which classes and properties are established as elements, and the range of certain property elements are constrained to be a particular controlled vocabulary/thesaurus. This definition of "metadata schema" describes DwC and Audubon Core pretty well.

I think the information in ISO 25964-2 provides some guidance about how we should handle controlled vocabularies and their relationship to TDWG vocabulary standards like DwC with respect to RDF representations. Controlled vocabulary lists are ways to guide indexers and searchers to preferred terms that represent concepts that can be used to help humans navigate and search in an organized fashion. SKOS is designed to facilitate translating these terms and concepts into machine-readable form, and supports translations of the terms into other languages. OWL isn't designed for this purpose and it would be counterproductive to try to turn controlled vocabulary terms into classes described in an ontology. Conversely, RDFS and OWL is designed to describe classes, individuals, and properties. SKOS isn't designed for this purpose and trying to use it to do so would be putting a square peg in a round hole.

ramorrismorris commented 9 years ago

@baskaufs: You are to be complimented for going the paper(?) route to gathering this spec, which like all ISO spec documents is hardly complimentary in the second sense of the word.

baskaufs commented 8 years ago

The draft Documentation Specification includes sections 4.5.2 and 4.5.3 that specify how to provide machine-readable metadata associated with controlled vocabulary terms.