tdwg / bdq

Biodiversity Data Quality (BDQ) Interest Group
https://github.com/tdwg/bdq
43 stars 7 forks source link

TG2-VALIDATION_COUNTRYSTATEPROVINCE_UNAMBIGUOUS #201

Open Tasilee opened 2 years ago

Tasilee commented 2 years ago
TestField Value
GUID d257eb98-27cb-48e5-8d3c-ab9fca4edd11
Label VALIDATION_COUNTRYSTATEPROVINCE_UNAMBIGUOUS
Description Is the combination of the values of the terms dwc:country, dwc:stateProvince unique in the bdq:sourceAuthority?
TestType Validation
Darwin Core Class dcterms:Location
Information Elements ActedUpon dwc:country
dwc:stateProvince
Information Elements Consulted
Expected Response EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if the terms dwc:country and dwc:stateProvince are bdq:Empty; COMPLIANT if the combination of values of dwc:country and dwc:stateProvince are unambiguously resolved to a single result with a child-parent relationship in the bdq:sourceAuthority and the entity matching the value of dwc:country in the bdq:sourceAuthority is an ISO country-like administrative entity in the bdq:sourceAuthority; otherwise NOT_COMPLIANT
Data Quality Dimension Conformance
Term-Actions COUNTRYSTATEPROVINCE_UNAMBIGUOUS
Parameter(s) bdq:sourceAuthority
Source Authority bdq:sourceAuthority default = "The Getty Thesaurus of Geographic Names (TGN)" {[https://www.getty.edu/research/tools/vocabularies/tgn/index.html]}
Specification Last Updated 2023-09-18
Examples [dwc:country="Argentina", dwc:stateProvince="Rio Negro": Response.status=RUN_HAS_RESULT, Response.result=COMPLIANT, Response.comment="dwc:country and dwc:stateProvince are unambiguous"]
[dwc:country="", dwc:stateProvince="WA": Response.status=RUN_HAS_RESULT, Response.result=NOT_COMPLIANT, Response.comment="dwc:country and dwc:stateProvince are ambiguous. Matches Western Australia, Washington State (US)"]
Source VertNet, Kurator
References
Example Implementations (Mechanisms) Kurator
Link to Specification Source Code https://github.com/kurator-org/kurator-validation/blob/master/packages/kurator_dwca/workflows/dwca_geography_assessor.yaml
Notes See table https://github.com/tdwg/bdq/issues/95#issuecomment-1226450014. A fail condition may arise from the content being internally inconsistent (not all of the information can be true at the same time), or from the vocabulary being incapable of uniquely resolving the combination of term values. This test specifically does not consider the content of dwc:higherGeography. If dwc:country contains a value and dwc:stateProvince does not, this test will return NOT_COMPLIANT. Use cases where knowledge to the level of country is adequate for the fitness of the data should not include this test. @tucotuco: "Of #200 and #201, #201 is the strongest test. If it passes for a record, #200 must necessarily also pass and doesn't tell you anything. If #201 fails,#200 could still pass and that would tell you that there are multiple matches on the dwc:country/dwc:stateProvince combo: It would tell you the nature of the problem. Along with #42 (dwc:country not empty), #200 would tell you whether there was an ambiguous combination of country (not empty) and dwc:stateProvince, such as would happen with Argentina/Buenos Aires. While if country is empty, then the ambiguity is purely at the dwc:stateProvince level".
ArthurChapman commented 2 years ago

Suggest modifying the Expected Response (changes in italics)

EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if either of the terms dwc:country and dwc:stateProvince are EMPTY; COMPLIANT if the combination of values of dwc:country and dwc:stateProvince are unambiguously resolved in the bdq:sourceAuthority; otherwise NOT_COMPLIANT

Tasilee commented 2 years ago

I don't think that is right. As per @tucotuco examples with #95, we are testing for ambiguity and one of the terms can be empty.

tucotuco commented 2 years ago

I don't think that is right. As per @tucotuco examples with #95, we are testing for ambiguity and one of the terms can be empty.

I agree, it is correct as "INTERNAL_PREREQUISITES_NOT_MET if the terms dwc:country and dwc:stateProvince are EMPTY".

chicoreus commented 2 years ago

How about:

EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if the terms dwc:country and dwc:stateProvince are EMPTY; COMPLIANT if the combination of values of dwc:country and dwc:stateProvince are unambiguously resolved to a single result with a child-parent relationship in the bdq:sourceAuthority and the entity matching the value of dwc:country in the bdq:sourceAuthority is an ISO country-like entity in the bdq:sourceAuthority; otherwise NOT_COMPLIANT

chicoreus commented 2 years ago

This phrasing avoids a compliant result from missmapping of dwc:county onto stateProvince and stateProvince onto country, or instances where dwc:country and dwc:stateProvince are switched.

Tasilee commented 2 years ago

Done

Tasilee commented 1 year ago

Added to Notes: "This test will fail if there are leading or trailing white space or non-printing characters."

ArthurChapman commented 1 year ago

In the Notes the Reference to "See table #95 (comment)" (i.e. "See table https://github.com/tdwg/bdq/issues/95#issuecomment-1226450014)" will need to be updated - but not sure how we can reference the comment

95 can be changed to "VALIDATION_GEOGRAPHY_CONSISTENT (78640f09-8353-411a-800e-9b6d498fb1c9)" but the comment and table won't appear there without us putting it somewhere we can reference it.

Tasilee commented 1 year ago

Updated Parameter(s) value to align with other tests

Tasilee commented 1 year ago

Post Zoom 11/7/2023, I have aligned the Source Authority with the suggested syntax:

bdq:sourceAuthority default = "The Getty Thesaurus of Geographic Names (TGN)" [https://www.getty.edu/research/tools/vocabularies/tgn/index.html]

to

bdq:sourceAuthority default = "The Getty Thesaurus of Geographic Names (TGN)" {[https://www.getty.edu/research/tools/vocabularies/tgn/index.html]}

Tasilee commented 11 months ago

Splitting bdqffdq:Information Elements into "Information Elements ActedUpon" and "Information Elements Consulted".

Also changed "Field" to "TestField", "Output Type" to "TestType" and updated "Specification Last Updated"

chicoreus commented 6 months ago

Removed inaplicable "fail" text from note. This is covered by unambigous in the specification, and leading/trailing whitespace should not block matches.

Tasilee commented 4 weeks ago

Updated Notes from @tucotuco's Comment https://github.com/tdwg/bdq/issues/21#issuecomment-2282949284 which I thought was needed here.

ArthurChapman commented 2 weeks ago

Altered Expected Response to add "administrative" entity