tdwg / bdq

Biodiversity Data Quality (BDQ) Interest Group
https://github.com/tdwg/bdq
43 stars 7 forks source link

TG2-VALIDATION_DEGREEOFESTABLISHMENT_STANDARD #275

Open ArthurChapman opened 7 months ago

ArthurChapman commented 7 months ago
TestField Value
GUID 060e7734-607d-4737-8b2c-bfa17788bf1a
Label VALIDATION_DEGREEOFESTABLISHMENT_STANDARD
Description Does the value of dwc:degreeOfEstablishment occur in the bdq:sourceAuthority?
TestType Validation
Darwin Core Class dwc:Occurrence
Information Elements ActedUpon dwc:degreeOfEstablishment
Information Elements Consulted
Expected Response EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if dwc:degreeOfEstablishment is bdq:Empty; COMPLIANT if the value of dwc:degreeOfEstablishment is in the bdq:sourceAuthority; otherwise NOT_COMPLIANT.
Data Quality Dimension Conformance
Term-Actions DEGREEOFESTABLISHMENT_STANDARD
Parameter(s) bdq:sourceAuthority
Source Authority bdq:sourceAuthority default = "Degree of Establishment Controlled Vocabulary List of Terms" {[https://dwc.tdwg.org/doe/]} {GBIF vocabulary API [https://api.gbif.org/v1/vocabularies/DegreeOfEstablishment/concepts]}
Specification Last Updated 2024-02-09
Examples [dwc:degreeOfEstablishment="cultivated": Response.status=RUN_HAS_RESULT, Response.result=COMPLIANT, Response.comment="dwc:degreeOfEstablishment found in the bdq:sourceAuthority"]
[dwc:degreeOfEstablishment="grown in garden": Response.status=RUN_HAS_RESULT, Response.result=NOT_COMPLIANT, Response.comment="dwc:degreeOfEstablishment not found in the bdq:sourceAuthority"]
Source TG2
References
  • Darwin Core Maintenance Group (2021) Degree Of Establishment Controlled Vocabulary List of Terms. Biodiversity Information Standards (TDWG). http://dwc.tdwg.org/dwc/doc/doe/
  • Groom et al. (2019) Improving Darwin Core for research and management of alien species. Biodiversity Information Science and Standards 3: e38084. https://doi.org/10.3897/biss.3.38084
Example Implementations (Mechanisms)
Link to Specification Source Code
Notes This test must return NOT_COMPLIANT if there is leading or trailing whitespace or there are leading or trailing non-printing characters.
ArthurChapman commented 7 months ago

Should be made CORE - see comments under #268

chicoreus commented 6 months ago

Updated notes to change "fail" to more explicit "This test must return NOT_COMPLIANT if there is leading or trailing whitespace or there are leading or trailing non-printing characters. "

chicoreus commented 6 months ago

Source authority should be:

bdq:sourceAuthority default = "Degree of Establishment Controlled Vocabulary List of Terms" {[https://dwc.tdwg.org/doe/]} {GBIF vocabulary API [https://api.gbif.org/v1/vocabularies/DegreeOfEstablishment/]}

tucotuco commented 5 months ago

I think https://api.gbif.org/v1/vocabularies/DegreeOfEstablishment/concepts is OK as a source authority because it actually has an API, as long as it is understood that the actually vocabulary is maintained at https://dwc.tdwg.org/doe/ and the GBIF API is expected to remain up to date with that.

Tasilee commented 5 months ago

Changed Source Authority from

bdq:sourceAuthority default = "Darwin Core degreeOfEstablishment" {[https://dwc.tdwg.org/list/#dwc_degreeOfEstablishment]} {dwc:degreeOfEstablishment vocabulary API [https://api.gbif.org/v1/vocabularies/DegreeOfEstablishment/concepts]}

to

bdq:sourceAuthority default = "GBIF DegreeOfEstablishment Vocabulary" [https://api.gbif.org/v1/vocabularies/DegreeOfEstablishment]} {"dwc:degreeOfEstablishment vocabulary API" [https://api.gbif.org/v1/vocabularies/DegreeOfEstablishment/concepts]}

tucotuco commented 5 months ago

Source Authority should be

bdq:sourceAuthority default = "Degree of Establishment Controlled Vocabulary List of Terms" {[https://dwc.tdwg.org/doe/]} {GBIF vocabulary API [https://api.gbif.org/v1/vocabularies/DegreeOfEstablishment/concepts]}

Tasilee commented 5 months ago

Changed Source Authority from

bdq:sourceAuthority default = "GBIF DegreeOfEstablishment Vocabulary" [https://api.gbif.org/v1/vocabularies/DegreeOfEstablishment]} {"dwc:degreeOfEstablishment vocabulary API" [https://api.gbif.org/v1/vocabularies/DegreeOfEstablishment/concepts]}

to

bdq:sourceAuthority default = "Degree of Establishment Controlled Vocabulary List of Terms" {[https://dwc.tdwg.org/doe/]} {GBIF vocabulary API [https://api.gbif.org/v1/vocabularies/DegreeOfEstablishment/concepts]}

chicoreus commented 5 months ago

We have to address a case missmatch between the TDWG vocabulary and the GBIF API. The TDWG vocabulary is authoriative, and has both Label and Controlled Value with an initial letter lower case, e.g. "native", the GBIF API has Name and Label.Value in mixed case, e.g. "Native". A lookup in the GBIF API has a cross reference to the TDWG vocabulary by IRI (thus only the GBIF API needs to be consulted to check dwciri values (though not consistently between vocabularies)), but not by label, implementors using the GBIF API would still have to consult the TDWG vocabulary to obtain the actual controlled value for the dwc: term.

"native" should return Compliant, "Native" not compliant. referencing the GBIF API here does not assist either the specification or implementors.

We should probably remove the GBIF API from the source authority for this set of VALIDATION_..STANDARD tests (but include it for the related AMENDMENT..._STANDARDIZED tests, as it provides alternative terms that can be used for standardization.

This comment applies to #268, #269, #275, #276, #277, and #278 where the TDWG vocabulary has values with an initial lower case letter, and the GBIF API does not match, having values with an initial capital letter.

Examples:

Pathway:

http://rs.tdwg.org/dwcpw/values/p039 Controlled Value = hullFouling vs https://api.gbif.org/v1/vocabularies/Pathway/concepts/HullFouling/ name = HullFouling externalDefinitions[0]=http://rs.tdwg.org/dwcpw/values/p039

Degree of Establishment:

http://rs.tdwg.org/dwcdoe/values/d001 Controlled Value = native vs https://api.gbif.org/v1/vocabularies/DegreeOfEstablishment/concepts/Native/ name = "Native" sameAsUris[0]='http://rs.tdwg.org/dwcdoe/values/d001

Establishment Means:

http://rs.tdwg.org/dwcem/values/e001 Controlled Value = native vs https://api.gbif.org/v1/vocabularies/EstablishmentMeans/concepts/Native/ name = "Native" sameAsUris[0]=http://rs.tdwg.org/dwcem/values/e001

tucotuco commented 5 months ago

I think the better path is to convince GBIF to correct their implementations of the vocabularies. It was probably an oversight or an attempt to follow a capitalization pattern that was established before these vocabularies were added.

chicoreus commented 5 months ago

@tucotuco I agree. In the absence of that change, we should remove the reference to the GBIF API from the validations, and add a note for implementors to the ammendments.

chicoreus commented 5 months ago

@tucotuco yes, ideal would be an alignment of the text strings in the API to the standard, though GBIF doesn't necessarily have to change the values of the Names in their API, they only need to provide the TDWG IRI and ControlledVocabulary value in consistent places where we can tell implementors to look for them.

ArthurChapman commented 4 months ago

@timrobertson100

timrobertson100 commented 4 months ago

Thanks for the ping

It was probably an oversight or an attempt to follow a capitalization pattern that was established before these vocabularies were added.

It was indeed. Just for background, one design goal of the vocabulary server was to serve up various vocabularies (internal or external such as TDWG, IUCN etc) in a consistent manner through a unified API. When it comes to formats we're dealing with vocabularies that use lowerCamelCase, UpperCamelCase, lower_snake_case, CAPTIAL_SNAKE_CASE etc and - rightly or wrongly - the intent was to try and normalise things. That approach resulted in our following Concepts with UpperCamelCase and properties as lowerCamelCase similar to e.g. DwC at the time with Classes (HumanObservation).

I'll chat with people here and comment back on the feasibility of changing at our end - it may have implications I'm not aware of (e.g. external API users). As I understand it the options are either we change the name to lowerCamelCase or add a field controlledValue - if I'm mistaken please correct me.

More to follow...

chicoreus commented 4 months ago

On Thu, 18 Apr 2024 01:45:43 -0700 Tim Robertson @.***> wrote:

As I understand it the options are either we change the name to lowerCamelCase or add a field controlledValue - if I'm mistaken please correct me.

Consistency within the API feels like a good thing.

Best path feels like it is to add a controlledValue term to the response. Given the presence of externalDefinitions, it might make sense to add structure to externalDefinitions, giving it the form:

key: 113 name: HullFouling externalDefinitions: [0] iri: http://rs.tdwg.org/dwcpw/values/p039 controlledValue: hullFouling

timrobertson100 commented 4 months ago

Hi all

I think the better path is to convince GBIF to correct their implementations of the vocabularies. It was probably an oversight or an attempt to follow a capitalization pattern that was established before these vocabularies were added.

The GBIF concepts have been updated to lowerCamelCase to follow the TDWG convention for the following vocabularies:

https://registry.gbif.org/vocabulary/Pathway/concepts https://registry.gbif.org/vocabulary/DegreeOfEstablishment/concepts https://registry.gbif.org/vocabulary/EstablishmentMeans/concepts

Tasilee commented 4 months ago

GBIF vocabulary has now been aligned with Darwin Core. Thanks @timrobertson100