tdwg / tag

Technical Architecture Group
https://tag.tdwg.org/
5 stars 0 forks source link

Please review and comment on Audubon Core Issue 134 #23

Closed chicoreus closed 2 years ago

chicoreus commented 5 years ago

@baskaufs just raised a question asking for input from the TAG on the Audubon Core issue tracker. https://github.com/tdwg/ac/issues/134 Core question is whether AC terms taken from other TDWG vocabularies should reference a particular frozen version of a term in the referenced vocabulary, or should point to the latest version of that term. Please comment on the issue in AC.

qgroom commented 5 years ago

Does this come down to versioning? If there were citable versioned releases of Audubon Core then it could refer to the stable version of other TDWG vocabularies and at each release the changes made in other vocabularies could be reviewed before the release.

wouteraddink commented 5 years ago

Since Audubon Core clearly refers to a specific version of DwC terms here, it would need a new Audubon Core version to refer to a newer version of the same DwC terms, I would say. For a standard I would argue that it is beneficial to refer to a specific fixed version of another standard, even if it is referring a standard within TDWG. In principle you could refer to the latest version but then if two datasets use it, you have to take the publication date of these datasets into account to compare them, and assume they have really used the latest version of that standard at publication. I think that may lead to a lot of room for error in some cases.

baskaufs commented 5 years ago

Complete versioning data are available for Audubon Core as well as Darwin Core. The data live at https://github.com/tdwg/rs.tdwg.org. The particular data for DwC term versions borrowed by AC lives here and here. It's queriable at https://sparql.vanderbilt.edu/. This query:

prefix dcterms: <http://purl.org/dc/terms/>
prefix tdwgutility: <http://rs.tdwg.org/dwc/terms/attributes/>
select  ?termListVersion ?termVersion
FROM <http://rs.tdwg.org/>
WHERE {
  <http://rs.tdwg.org/ac/dwc/> dcterms:hasVersion ?termListVersion.
  ?termListVersion tdwgutility:status "recommended"^^xsd:string.
  ?termListVersion dcterms:hasPart ?termVersion.
}

will show you the Darwin Core term versions currently part of Audubon Core. More complex queries could be used to automate the process of comparing publication dates to versions.

So it's not really an issue of recording or discovering the versions. Recording version changes can be done by generating a new term list version for DwC terms borrowed by AC that contains the updated term versions and various discovery methods are possible.

There are really several questions here:

  1. Should Audubon Core have a policy that says that the borrowed terms automatically get updated to the newest versions, or if they stay frozen unless a decision is made by somebody to update specific borrowed terms.
  2. Does the answer to question 1 depend on whether the borrowed terms are minted by TDWG or not?
  3. If terms don't get automatically updated, how do we decide the circumstances under which some or all of the borrowed terms get updated? I suppose the AC Maintenance Group would try to implement Section 3 of the Vocabulary Maintenance spec to the extent that we can figure out how it applies to this situation.
hlapp commented 5 years ago

I would ask is there a normative machine readable file (RDF, or OWL) of the Audubon Vocabulary. If yes, then how are the DwC terms imported in that file. Do they use a versioned DwC ontology, or versioned DwC term URIs. If no, then there never was a freeze to some DwC version (and it's rather debatable whether the AC documentation even demands any freeze).

If yes, then indeed one might wish to release a new version of the AC vocabulary that differs from the previous one only by the version URIs it uses for DwC as a whole or the select DwC geography terms.

I'd note that all links in the AC documentation use version-less term URIs for DwC terms. So these unambiguously do refer to their latest version, not to the 2009 version. Given that the text reference to the 2009 DwC is not clear (see above), I'd argue it should be the actual use of DwC terms (in the form of their URIs) in the AC document that should inform disambiguation.

baskaufs commented 5 years ago

The ratified version of the Standards Documentation Specification (SDS) does away with idea contained in the unratified SDS that particular documents or serializations are normative. Rather, it states that normative content is content that is declared to be normative. In the case of human-readable documents, normative content is declared and labeled (Section 3.2.1) in the document introduction. For machine-readable documents, declaring what is normative is described in Section 4.4.2.1, although I don't think it has actually been implemented yet in machine-readable metadata. The consensus within the AC and DwC Maintenance Groups has been that term definitions are normative and that labels and comments/notes are not. In AC, we also have Usage guidelines for some terms (particularly the borrowed ones), which are normative. See Section 1.1 of the AC Term Guide where the normative term metadata properties are clearly declared.

The SDS also states in Section 2.2.4 that all distributions of a term list (human-readable, RDF flavors, JSON-LD) MUST contain substantively the same information about the terms on the list. For Audubon Core, the way that is guaranteed in practice is by generating all distributions (or "representations" if you prefer that term) of term metadata by generating them from the same CSV tables found in the rs.tdwg.org. So for example, all distributions of the AC-defined terms come from this table and all distributions of the DwC-borrowed terms come from this table. These metadata for the current terms are generated from term versions having the status of "recommended" in the corresponding term version table (e.g. this table for Audubon Core-defined terms). So the content of the rdfs_comment column (the term definition) in the term version table is the source that actually determines all of downstream term definitions in every serialization.

So in the case of the Darwin Core terms borrowed by AC, the ultimate source of their definitions is the refs_comment column of the dwc-for-ac-version.csv table, which is very specific about the version of the DwC term. This table does not automatically get updated from Darwin Core - it takes specific action on the part of the AC Maintenance Group to make a change to that table. So rightly or wrongly, the borrowed DwC terms are actually currently "frozen" at the definitions that they had in the 2009-12-09 version of DwC (actually identified as http://rs.tdwg.org/dwc/version/terms/2009-12-07) as documented here.

As you can see here, there has only ever been one version of AC because nothing has changed since it was ratified. However, after we make the changes resulting from our current documentation cleanup, we will in fact generate a new version of the vocabulary and all of the details will be documented in the appropriate place in the repo.

I realize that there are a lot of details here, but if there are any followup questions about the workflow, I'd be happy to clarify.

baskaufs commented 5 years ago

I should also note that the term versions could be displayed on the Term List page. It would simply require a change to this script. However, including term versions was not done on the previous incarnation of that document that we were trying to mimic and adding the term version URIs would potentially add to confusion on an already long and somewhat complicated document. There are other ways for people to find out the gory details of versioning beyond that page.

baskaufs commented 2 years ago

This issue was resolved by a policy decision of the Audubon Core Maintenance Group