tdwg / vocab

Vocabulary Maintenance Specification Task Group + SDS + VMS
11 stars 6 forks source link

What's a version? #40

Closed baskaufs closed 8 years ago

baskaufs commented 8 years ago

The model described in the documentation specification discusses versions in a generic way and indicates that the version model can apply to any resource. As a practical matter, how does generating a new version affect versions of resources at a higher level in the hierarchy? Or does it even effect the higher levels? For example, if there is a change to the definition of a single term in Darwin Core, do we now have a new version of Darwin Core? Or is this a new release? If so, what's the difference between a release and a version? Do we only spawn a new version at the vocabulary level when there is a Decision recorded? Maybe we don't have versions of standards at all, just versions of documents and vocabularies included in the standard.

How does this issue affect the "packaging" of standards? The old documentation specification said "Standards take the form of a logical folder or directory, but may be distributed as a zip or tar archive file." Those frozen "packages" were uploaded to OJS where they were neither safe from destruction, nor easily visible to the public. How do we update this idea that a standard is stable and "frozen" in the age of GitHub and Terms Wikis?

tucotuco commented 8 years ago

I believe the new paradigm would do well to center around releases in GitHub, as that is where everything is being managed.

On Sat, Apr 16, 2016 at 3:16 PM, Steve Baskauf notifications@github.com wrote:

The model described in the documentation specification discusses versions in a generic way and indicates that the version model can apply to any resource. As a practical matter, how does generating a new version affect versions of resources at a higher level in the hierarchy? Or does it even effect the higher levels? For example, if there is a change to the definition of a single term in Darwin Core, do we now have a new version of Darwin Core? Or is this a new release? If so, what's the difference between a release and a version? Do we only spawn a new version at the vocabulary level when there is a Decision recorded? Maybe we don't have versions of standards at all, just versions of documents and vocabularies included in the standard.

How does this issue affect the "packaging" of standards? The old documentation specification said "Standards take the form of a logical folder or directory, but may be distributed as a zip or tar archive file." Those frozen "packages" were uploaded to OJS where they were neither safe from destruction, nor easily visible to the public. How do we update this idea that a standard is stable and "frozen" in the age of GitHub and Terms Wikis?

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/tdwg/vocab/issues/40

baskaufs commented 8 years ago

The details of this are problems to be handled as part of the vocabulary maintenance and management processes and aren’t a concern of the documentation specification. There is now a section 5 of the draft Standards Documentation Specification (formerly the first of two badly numbered sections 3.3), which describes in a general way the requirements for archiving standards documents without specifying the mechanics of how that archiving will be achieved.

I have removed this issue from blocking revision of the draft documentation specification. However, I've also labeled it as an issue to be examined in the context of vocabulary management.

baskaufs commented 8 years ago

didn't mean to close this one!

baskaufs commented 8 years ago

The completed draft of the Vocabulary Maintenance Specification specifies conditions that will trigger version changes for terms (Section 3.3.4.1) and documents (Section 3.4.3). The document currently does not deal with the issue of what triggers a version change for the whole vocabulary, nor does clarify the relationship between "release" and "version" on the vocabulary level. It is not clear to me whether this is in scope for the Vocabulary Maintenance Specification or if it is an implementation issue to be handled by the people who are actually doing the maintaining.

jar398 commented 8 years ago

The version change paragraph is succinct and to the point; I like it. But I wonder if it should be pointed out that, because terms are used in communication, and communication always involves two agents, "implementation" could mean either a generator/sender/curator, or a consumer/receiver/reasoner/indexer/whatever.

I think sometimes people get this wrong and think that a vocabulary (or language) change is compatible just because it does not affect one kind of these two kinds of agent - forgetting the possible effect on the other.

Or am I being too rigid here? Regardless, it might be good to say a little more about "implementation".

ramorrismorris commented 8 years ago

(I think this is relevant, but if not feel free to delete it.) In what is now dwcFP, several elements originally began with <rdfs:comment lang="en"> (they should have had xml:lang) and were therefore not valid RDF. If the only changes I make are correcting these comments, does the Vocabulary Maintenance Spec draft offer me any guidance on whether I should make the corrected ontology a new version?

Does @jar398's remark imply yes? Or am I being insufficiently rigid here.

jar398 commented 8 years ago

I think yes. The question posed by the policy (as I understand it) is, what do implementations in the wild actually do - would any [conforming to spec #1] be nonconformant with spec #2, where spec #2 is the owl file that is the same spec #1 but with lang= replaced by xml:lang= ? It would be easy for an adversary to arrange incompatibility, but very hard to imagine incompatibility in a good-faith implementation.

(You raise another question, which I suppose belongs in a different thread, which is how you are supposed to use an OWL file as a specification. You can't really; you need to be told the conformance criteria independently, because there are too many different ways one might interpret OWL as spec. For dwcFP I presume this is done in a separate prose document; haven't checked...)

baskaufs commented 8 years ago

With respect to Bob's question, I think that incrementing the version of an ontology, document, or term is a signal to pay attention - this change matters. If the RDF isn't valid, then parsers may balk at importing the triples. Fixing that problem is a change that matters and a change that prevents things from being "broken", so I'd increment the version to try to make people pay attention. Maybe there should be some general statement about the purpose of advancing a version, or is that implicitly understood?

Jonathan, can you suggest some wording that might clarify or tighten up section 3.2.2? I think that I probably appropriated the wordings "the correction is likely to impact existing implementations" and "minimizes adverse effects on existing applications" from the DwC Namespace policy without carefully thinking through the definitions of "implementations" and "applications", or the differences between those words.

In Section 3.1 I articulate the general principle that there is a need to maintain "facilitation of data sharing" and in section 3.3.1 I say that a change shouldn't "adversely affect the interoperability of existing applications". "Interoperability" was a word for which I looked up the definition, and it seems to describe the requirement that what a sender generates must be understandable by a receiver. But I don't actually say that changes should promote interoperability. Should I, or is that covered by maintaining "facilitation of data sharing"? I would welcome wording suggestions that would help address Jonathan's concerns.

jar398 commented 8 years ago

That makes sense. Maybe I was wrong about the comment problem not affecting clients. If the spec gets fixed, a client generating the bad syntax (which it would feel entitled to do, since it's in the spec) might have to be changed. And an interpreter that wants to do validation might need to be changed, if it allowed that syntax before but wants to rule it out now. (probably a bad idea.) If there are no such clients - and I think it extremely likely that there are none now - nobody needs to "pay attention", and this is just an erratum. But this is hard to know, so maybe err on the safe side and say that it might matter to some unknown generator (or document).

Analyzing kinds of changes:

a. some changes, like fixing typos in the spec or making noncontroversial clarifications, don't affect any kind of client (remember we only worry about clients in the wild, not hypothetical ones)

b. some changes, like adding a new vocabulary term or expanding the scope of a predicate, affect interpreters but not generators (of course generators talking to old interpreters need to be careful not to use the new feature)

c. some changes, like removing a term or reducing the scope of a term's applicability, affect generators but not interpreters (e.g. the comment syntax change) (of course interpreters of old content need to remember the old meaning, and watch out for inconsistencies that might arise)

d. some changes, like changing the meaning of a term (e.g. a predicate that becomes neither broader nor narrower), affect both generators and interpreters. IMO such changes simply shouldn't be made (although combining a b. change with a c. change can have the same effect)

(I prefer 'generator' and 'interpreter' to 'sender' and 'receiver' because they imply a greater degree of involvement in meaning, but the terminology is not important.)

So I think a. does not require a "pay attention" signal, while all the others do. It's simplest to say that a.-changes go in errata, which are not spec revisions and don't get versioned (in the same way). Any new version of the spec is a new version not because it is just fixing typos, but because clients need to "pay attention" (b-d). I think W3C has started maintaining spec errata on a wiki, so the errata version is just its timestamp.

You asked for wording, I know, and if you nod your head to the above and say we still need new wording (maybe what's there is fine), I'll look again and see if I can come up with anything. I don't think your document needs to capture what I said above.

One thing to note is that there is really no such thing as backward compatibility for any ontology change. Any change will break either some generator or some interpreter. The only workaround is to assert the policy of "ok to ignore if not understood" for interpreters so that new terms can be added without breaking interpreters. But I think most people who work with data would say that's a bad policy, since data that's not understood is often accompanied by changes or obligations that an interpreter really needs to be aware of. (This policy is one reason so many people hate RDF, I think. The policy also figured in the HTML5/XHTML wars.)

baskaufs commented 8 years ago

I have made several changes and additions to the Vocabulary Maintenance Specification draft to address the issues raised in this thread.

I feel that this issue has been addressed well enough to close this issue. If anyone has further suggestions, please include them in the comments on the draft that I'm going to make available ahead of the next TG Google Hangout.