tdwg / dwc-qa

Public question and answer site for discussions about Darwin Core
Apache License 2.0
49 stars 8 forks source link

Declaring controlled vocabularies used #69

Open pzermoglio opened 7 years ago

pzermoglio commented 7 years ago

(These questions were posed during the second chapter of the DwC Hour, 7 Mar 2017).

What about declaring your vocabulary(ies) so community/taxon-specific vocabularies can be used?

pzermoglio commented 7 years ago

Related to issue #68

ekrimmel commented 7 years ago

discussion from DwC Hour #3: Thousands of Shades for 'Controlled' Vocabularies...

Quentin Groom (Botanic Garden Meise, Belgium): In a few cases terms in Darwin Core have a field to explain the controlled vocabulary (e.g. geodeticDatum). In some cases couldn't we add a field to explain which vocabulary we are using?

Steve Baskauf - Vanderbilt/TDWG VOMAG TG: From a management point of view, if there are many complicated controlled vocabularies, it would be a lot for the DwC group to manage. Communities of interest could do some of the heavy lifting.

Jodi: Quentin, I think declaring your vocabulary is very important. But, at the dataset level, the record level, or the individual field level?

Quentin Groom (Botanic Garden Meise, Belgium): Jodi, given the way records have a life of their own you have to link the vocabulary specification in the observation.

Dean Pentcheff [NHMLA]: There’s an analogy here between a “standard” and (perhaps) what controlled vocabularies should be: it makes me think of an organization’s bylaws vs. their standing rules. Bylaws are nearly constitution-like — very hard to change, and that’s the way it should be. Standing Rules are (more) easily changeable in response to community demand. From a social engineering perspective, vocabularies developed from within a research community are more likely to gain acceptance than those developed by well-meaning (and even smart) “outsiders”.

Jodi: Quentin, I would agree. At the least. I wonder if multiple vocabularies might be used in the same record, though. So would a field level declaration be needed?

Dan Stoner (iDigBio): It does not need to be all-or-none... in cases where a good external controlled vocab exists (like ISO Countries), use the external source. If it doesn't exist, the standard can still recommend the "best" known vocabulary, either externally (someone's github or web page), or... in the standard definition itself.

Dean Pentcheff [NHMLA]: And, if adherence to the list is required, records with non-list values get automatically trashed... which may not be the best outcome.

John Wieczorek (Darwin Core): @Jodi In that case, another potential solution is to make a list from multiple sources and point to that. So far though, the lack of controlled vocabularies seems to be the bigger issue.

Steve Baskauf - Vanderbilt/TDWG VOMAG TG: The solution to some of the issues listed here is handled by the fact that terms in TDWG standards (including controlled vocabularies) are identified by URIs. If a URI is used, the source of the vocabulary is known. We have dwc: terms for verbatim (text) fields and dwciri: terms for storing the URIs. All of the spelling variants are covered by properties of the term, whose URI identifier can be opaque. The properties of the term (alternative spellings) can be added to, or modified without invoking any standards process.

Dean Pentcheff [NHMLA]: Is there a time-dependence to those URI references? Do we need to worry about whether the list has changed since the record was created? Or am I just getting way too far into the weeds? :) It may be worthwhile to have a controlled vocabulary editor hierarchy. There are discipline-specific parts of vocabularies that should be edited by domain specialists. Then, above those editors, could be aggregating editors who build the recommended combined vocabulary for an entire Darwin Core term. This is similar to the editor hierarchy in WoRMS. Another way to "discover" some of the key community people who might be important to involve, is to look at who is contributing large numbers of records involving controlled vocabularies to aggregators.

Shelley: Ontology aggregators, e.g. obofoundry.org; archive and library communities

Steve Baskauf - Vanderbilt/TDWG VOMAG TG: We are trying to re-solve problems that librarians have already solved http://www.niso.org/schemas/iso25964/ Unfortunately ISO 25964 is behind a paywall.

ekrimmel commented 7 years ago

discussions from DwC Hour #4: Evolution of Darwin Core Terms and Extensions...

Randy Singer (iDigBio-FLMNH): Very important to have discipline specific preparations controlled vocab, possibly even a drop down menu to force data providers’ cooperation. Just to clarify on my point I'm just advocating for standardize, discipline vetted preparations to be used with another field for verbatim preparation. When you try to look across millions of data points it becomes almost impossible to get good data from non standardized prep types, but then verbatim prep type would exist for those that wanted the details and history of preparations. :)

Deb Paul - iDigBio: Yes @Randy - verbatim also good for Linked Data initiatives to link to things like BHL data


Patricia Mergen (Botanic Garden Meise/ Africamuseum): yes in ABCD you have for almost each concept an atomised field and the verbatim term (text field), it is very useful, but only very few providers take the time to fill it in nor to parse their verbatim data to put them in the atomised fields, unless it is like that in their database or they have an automated tool to split the data .

Steve Baskauf (Vanderbilt University): Note that TDWG already has a way to differentiate between controlled vocabulary values and verbatim values. The IRIs that identify the controlled vocabulary terms are values of dwciri: terms. Verbatim values are values of dwc: terms.

ekrimmel commented 7 years ago

people interested in this topic for preparations: https://github.com/VertNet/dwc-qa-manage/issues/23