tdwg / vocab

Vocabulary Maintenance Specification Task Group + SDS + VMS
11 stars 6 forks source link

What parts of term definitions are normative? #35

Closed baskaufs closed 8 years ago

baskaufs commented 8 years ago

Currently, Darwin Core and Audubon Core have vocabulary list documents that are designated as "Type 1" (a.k.a. normative) based on the document typing system established in the old draft Documentation Specification. The new draft specification has done away with that system and allows human-readable documents to declare that particular sections or components (such as figures) to be normative or non-normative.

Based on the current practice, everything contained in the term list document (https://github.com/tdwg/dwc/blob/master/rdf/dwctermshistory.rdf and http://terms.tdwg.org/wiki/Audubon_Core_Term_List) should be considered normative, including examples and informative comments included in the term metadata. I believe that there has been some consensus in previous discussion that those examples and informative comments should not be considered normative, and that only the URI and definition should be considered normative (don't know about the label).

Should we specify this as a general practice? It should be possible to make this convention clear in the text of each human-readable term list. Since the specification as currently written would end the practice of having an RDF document being the normative document for standards, the question of how to express what RDF triples are normative would be moot.

see section 3.3.3.1 of the draft documentation specification

tucotuco commented 8 years ago

Could a distinction be made that a machine-readable document can be normative, but if so, it's entire content outside of commented text must be normative? While human-readable documents can contain a mix, but anything not explicitly stated as normative is not normative?

Why? Because I am not sure that the following statement from Section 3 is necessarily true:

"Determining what is necessary to comply with a standard is necessarily a human activity. Therefore, the normative content of the standard should be contained exclusively in human-readable documents."

It may be a human activity to define what is normative, but at some point, or at some level, machines probably need to be able to test whether an artifact claiming to be compliant is so.

In the alternative scenario, what will have to happen to make existing standards compliant? What will be their status until they are compliant?

On Sat, Apr 16, 2016 at 2:16 PM, Steve Baskauf notifications@github.com wrote:

Currently, Darwin Core and Audubon Core have vocabulary list documents that are designated as "Type 1" (a.k.a. normative) based on the document typing system established in the old draft Documentation Specification. The new draft specification has done away with that system and allows human-readable documents to declare that particular sections or components (such as figures) to be normative or non-normative.

Based on the current practice, everything contained in the term list document (https://github.com/tdwg/dwc/blob/master/rdf/dwctermshistory.rdf and http://terms.tdwg.org/wiki/Audubon_Core_Term_List) should be considered normative, including examples and informative comments included in the term metadata. I believe that there has been some consensus in previous discussion that those examples and informative comments should not be considered normative, and that only the URI and definition should be considered normative (don't know about the label).

Should we specify this as a general practice? It should be possible to make this convention clear in the text of each human-readable term list. Since the specification as currently written would end the practice of having an RDF document being the normative document for standards, the question of how to express what RDF triples are normative would be moot.

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/tdwg/vocab/issues/35

ramorrismorris commented 8 years ago

@tucotuco asks: "In the alternative scenario, what will have to happen to make existing standards compliant? What will be their status until they are compliant?" It seems that W3 has specifications for the answers to these question, but such specs do not seem to be public....???

baskaufs commented 8 years ago

I think that my belief that the normative content of vocabulary standards should be human readable comes from what I've come to understand by studying what is and isn't possible to achieve with machine processing. The RDF 1.0 Semantics document [1] sums the situation up like this:

Exactly what is considered to be the 'meaning' of an assertion in RDF or RDFS in some broad sense may depend on many factors, including social conventions, comments in natural language or links to other content-bearing documents. Much of this meaning will be inaccessible to machine processing...

The chief utility of a formal semantic theory is not to provide any deep analysis of the nature of the things being described by the language or to suggest any particular processing model, but rather to provide a technical way to determine when inference processes are valid, i.e. when they preserve truth.

We can add a lot of OWL markup to term definitions, but that is basically going to accomplish one of two things:

  1. entail other triples
  2. destroy the consistency of the graph if the terms are used in the "wrong" ways

Neither of these things is ever going to make the machine client actually "understand" what the terms mean. The actual "meaning" of the terms is going to be encapsulated by the human-readable text found in the rdfs:comment values, and those comments are just going to be the same text that is in the human-readable HTML web page or PDF version of the vocabulary document.

So I suppose we could declare that the triples containing rdfs:comment as a predicate are normative. But what would that accomplish? Make people struggle to look at Turtle or XML files to figure out what is going on? A machine client is going to get absolutely nothing from that triple.

I suppose this is a philosophical discussion that is way beyond the scope of a Issues Tracker comment. But certainly in the case of Darwin Core, aside from a few subproperty declarations, there is nothing in the RDF that would help a machine client understand what terms mean - it's only the human-readable literals that provide the meaning.

[1] http://www.w3.org/TR/2004/REC-rdf-mt-20040210/#intro

nfranz commented 8 years ago

Not being a computer scientist, this nevertheless seems to strongly worded to me.

Certain types of information, I suppose by virtue of being encoded by humans in the right ways, are amenable to knowledge representation and reasoning. Logic reasoners can process this information to infer additional (implied) knowledge. Is that not a more productive way to set the bar for machine processing?

Best, Nico

On Mon, May 2, 2016 at 6:53 PM, Steve Baskauf notifications@github.com wrote:

I think that my belief that the normative content of vocabulary standards should be human readable comes from what I've come to understand by studying what is and isn't possible to achieve with machine processing. The RDF 1.0 Semantics document [1] sums the situation up like this:

Exactly what is considered to be the 'meaning' of an assertion in RDF or RDFS in some broad sense may depend on many factors, including social conventions, comments in natural language or links to other content-bearing documents. Much of this meaning will be inaccessible to machine processing...

The chief utility of a formal semantic theory is not to provide any deep analysis of the nature of the things being described by the language or to suggest any particular processing model, but rather to provide a technical way to determine when inference processes are valid, i.e. when they preserve truth.

We can add a lot of OWL markup to term definitions, but that is basically going to accomplish one of two things:

  1. entail other triples
  2. destroy the consistency of the graph if the terms are used in the "wrong" ways

Neither of these things is ever going to make the machine client actually "understand" what the terms mean. The actual "meaning" of the terms is going to be encapsulated by the human-readable text found in the rdfs:comment values, and those comments are just going to be the same text that is in the human-readable HTML web page or PDF version of the vocabulary document.

So I suppose we could declare that the triples containing rdfs:comment as a predicate are normative. But what would that accomplish? Make people struggle to look at Turtle or XML files to figure out what is going on? A machine client is going to get absolutely nothing from that triple.

I suppose this is a philosophical discussion that is way beyond the scope of a Issues Tracker comment. But certainly in the case of Darwin Core, aside from a few subproperty declarations, there is nothing in the RDF that would help a machine client understand what terms mean - it's only the human-readable literals that provide the meaning.

[1] http://www.w3.org/TR/2004/REC-rdf-mt-20040210/#intro

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/tdwg/vocab/issues/35#issuecomment-216414827

baskaufs commented 8 years ago

I have a response to @nfranz 's comment, but I'm thinking this comment box isn't the place for it. If I have time, I'll put it in a blog post and link to it.

baskaufs commented 8 years ago

Here is some text from the 2016-05-04 meeting notes about this issue:

There are essentially three separate but related processes going on here:

  1. A documentation process (described by the documentation specification), which includes demarcating what is normative and non-normative in standards documents, how version information is recorded, and how versions are connected to each other and to their current resources. It does not stipulate what should be normative or not, nor how versioning should be managed.
  2. A vocabulary maintenance process (described by the vocabulary maintenance specification) which includes decision-making about whether changes should be made to vocabularies and terms within them. This specification would presumably trigger varying levels of oversight depending on the extent to which the proposed changes would affect stability and interoperability of the vocabulary.
  3. A vocabulary management process (possibly described by some document, but if so, not one that is included as part of a standard) that would include practical aspects of managing documents, endpoints, GitHub repos, etc., and that would involve generating new versions and representations of documents, and releases of standards “bundles” of documents. The changes that take place would be documented by #1 and in some cases triggered by #2, but the management of those changes would be dictated by practicalities, not by prescribed rules.

Maintaining separation among these three processes, would make completing the two standards tractable. The complications involved in any one of these three processes would not necessarily impede description of the other two.

Given that understanding of the situation, these would be the implications for Issue #35 (normative parts of term definitions):

Section 3.2.1 of the current draft documentation specification says that authors of descriptive documents must indicate which parts of the document are normative and which (if any) are not. Vocabulary descriptions are specified as a special category of descriptive documents, so the same thing applies to them. Machine-readable representations of descriptive documents provide metadata about the documents, but do not include the full content of the document, so issues of normative vs. non-normative content do not apply to them. Machine-readable representations of vocabularies (in the form of terms and term lists) should include what is essentially the same information as is included in the human-readable representations.

Because of the effective identicalness of human- and machine-readable vocabulary representations, whatever designation of normative vs. non-normative that is made in the human-readable representation should also be made in the machine-readable representation. For example, if the human-readable vocabulary term list document states that the definitions are normative but that the comments are not, then the machine-readable description of the term list should include the same statement in an rdfs:comment value.

I’ve added section 4.4.2.1 and an example in 4.4.2.2 to clarify this. I have also removed the text from section 3 that declared that normative content is found only in human-readable documents. I think that these actions address this issue. The spec defines "normative" and "non-normative" but to be telling authors what should and should not be normative is basically out of the scope of the Documentation Spec.