tdwg / vocab

Vocabulary Maintenance Specification Task Group + SDS + VMS
11 stars 6 forks source link

Documenting the date of death for terms in a vocabulary #41

Closed baskaufs closed 8 years ago

baskaufs commented 8 years ago

The draft documentation specification now includes sections 4.2.2 (about deprecating resources in general) and 4.5.3 (about deprecating vocabulary terms). However, after working with the Darwin Core term history file (dwctermshistory.rdf) for a while, I've realized that there is a missing piece in what's in the current draft of the documentation spec. The metadata included in a vocabulary (or about any resource in general) should make it possible to determine the time period over which a resource was recommended for use. We can know the start of the life of a version by looking at the value of its dcterms:issued property. If the version has been superseded by a later version or by some other resource, it will have a dcterms:isReplacedBy property and we can look at the dcterms:issued date of its replacement to find out when the earlier version was no longer recommended.

However, if the resource is deprecated but not replaced by any other resource, there is no machine-readable way to know when it died. Of course, one could put a human-readable comment in the metadata, but a machine should be able to discover this information without having to parse text and guess what it means. By my count, Darwin Core has 48 such term versions in the dwc: namespace. For example, we know that dwc:taxonAttributes was issued on 2009-04-24 and from its dwcattributes:status property we know that it is currently deprecated. But we should be able to know when it died, and at the moment, I don't think we can.

Another way of thinking about the problem is that it should be possible to state for any particular date in the past what terms constituted the Darwin Core vocabulary. We can do that for all currently Recommended terms and all terms that were replaced by something else. But not for unreplaced, deprecated terms.

One possible property that could be used is dcterms:modified, because the version's metadata was modified when it was deprecated. But I think that's a bad idea because there are many ways a term version's metadata could be modified. I suppose one possibility would be to create some new term in the dwcattributes: namespace that is specifically designed to indicate the date on which a term was deprecated. Any other ideas?

tucotuco commented 8 years ago

I think that dcterms:modified is a good way to track the deprecation date, if it is maintained properly. The deprecation is a modification to the term, and it should, by definition, be the last one. I believe this to be the case with all the terms that are deprecated in Darwin Core. Those that have dcterms:modified equal to dcterms:issued were never used.

baskaufs commented 8 years ago

OK I ran some SPARQL queries against dwctermshistory.rdf . There are 45 term versions that have dwcattributes:status values of "deprecated" and aren't listed as having any replacement (not counting the probable errata for which I've entered issues on the DwC Issues tracker):

dwc:AccessConstraints-2008-11-19 dwc:accuracy-2009-01-21 dwc:binomial-2008-11-19 dwc:CatalogNumberNumeric-2008-11-19 dwc:Dataset-2008-11-19 dwc:DwCType-2008-11-19 dwc:EarliestDateCollected-2008-11-19 dwc:EndTimeOfDay-2008-11-19 modified 2009-04-24 dwc:EventAttribute-2008-11-19 dwc:EventAttributeRemarks-2008-11-19 dwc:eventAttributes-2009-04-24 dwc:eventMeasurementAccuracy-2009-04-24 dwc:eventMeasurementDeterminedBy-2009-04-24 dwc:eventMeasurementDeterminedDate-2009-04-24 dwc:eventMeasurementID-2009-04-24 dwc:eventMeasurementRemarks-2009-04-24 dwc:eventMeasurementType-2009-04-24 dwc:eventMeasurementUnit-2009-04-24 dwc:eventMeasurementValue-2009-04-24 dwc:HigherTaxon-2009-01-21 dwc:higherTaxonconceptID-2009-04-24 dwc:identificationAttributes-2009-04-24 dwc:LatestDateCollected-2008-11-19 dwc:locationAttributes-2009-04-24 dwc:occurrenceAttributes-2009-04-24 dwc:occurrenceDetails-2009-04-24 modified 2011-10-16 dwc:occurrenceMeasurementAccuracy-2009-04-24 dwc:occurrenceMeasurementDeterminedBy-2009-04-24 dwc:occurrenceMeasurementDeterminedDate-2009-04-24 dwc:occurrenceMeasurementID-2009-04-24 dwc:occurrenceMeasurementRemarks-2009-04-24 dwc:occurrenceMeasurementType-2009-04-24 dwc:occurrenceMeasurementUnit-2009-04-24 dwc:occurrenceMeasurementValue-2009-04-24 dwc:RelatedBasisOfRecord-2008-11-19 modified 2009-01-26 dwc:relatedResourceType-2009-04-24 dwc:SampleAttribute-2008-11-19 dwc:SampleAttributeRemarks-2008-11-19 dwc:SamplingAttributeID-2008-11-19 dwc:SamplingAttributeType-2008-11-19 dwc:SamplingEventAttributes-2008-11-19 dwc:SamplingEventRemarks-2009-01-18 dwc:SamplingLocation-2008-11-19 dwc:StartTimeOfDay-2008-11-19 modified 2009-04-24 dwc:taxonAttributes-2009-04-24

There are only four of these term versions that have dcterms:modified dates that differ from their issued dates, and with those modified dates occurring after the date string that's appended to the term name. It is also possible that some of them were actually replaced by some other term without that fact being noted in the RDF, although I didn't notice any obvious ones.

So based on what you said, should I consider the rest of the terms to have had their life ending after the version that is indicated by the date that's appended to the end of their names? It's a bit confusing because some of the terms have issued dates before the date that is appended to their name, but then have modified dates that are the same as the date that's appended to their name. For the purposes of simplicity, in my test implementation I've been saying that each term version came into existence on the date that's appended to their name.

tucotuco commented 8 years ago

The term dcterms:issued is defined as "Date of formal issuance (e.g., publication) of the resource." In pre-standard Darwin Core there was no particularly formal issuance. In those cases, dcterms:issued refers to when the original, semantically equivalent rdfs:comment came into existence. In the normal maintenance of a vocabulary, I believe dcterms:issued should be the same as the version indicator on the term, and that those should both match the date on which the term was ratified.

When a term is deprecated, dcterms:modified should be changed to the date on which the deprecation is ratified. In these cases, the lifetime of the term in the recommended state is from dwcterms:issued to dcterms:modified.

When a term is superseded, dcterms:modified should be changed to the date on which the replacement becomes ratified. Again, the lifetime of the term in the recommended state is from dwcterms:issued to dcterms:modified.

Note that this is not what has happened in Darwin Core, for which dcterms:issued incorrectly shows the date on which the idea was captured as a term with the conceptual meaning as given in its rdfs:comment. I don't think we should change any of the existing dcterms:issued values in Darwin Core.

baskaufs commented 8 years ago

Added a paragraph to section 4.2.2 regarding the use of dcterms:modified with deprecated resources to indicate the end of their lifespan.