tdwg / bdq

Biodiversity Data Quality (BDQ) Interest Group
https://github.com/tdwg/bdq
42 stars 7 forks source link

TG2-VALIDATION_COUNTRYCODE_NOTEMPTY #98

Open iDigBioBot opened 6 years ago

iDigBioBot commented 6 years ago
TestField Value
GUID 853b79a2-b314-44a2-ae46-34a1e7ed85e4
Label VALIDATION_COUNTRYCODE_NOTEMPTY
Description Is there a value in dwc:countryCode?
TestType Validation
Darwin Core Class dcterms:Location
Information Elements ActedUpon dwc:countryCode
Information Elements Consulted
Expected Response COMPLIANT if dwc:countryCode is bdq:NotEmpty; otherwise NOT_COMPLIANT
Data Quality Dimension Completeness
Term-Actions COUNTRYCODE_NOTEMPTY
Parameter(s)
Source Authority
Specification Last Updated 2024-11-10
Examples [dwc:countryCode="Australia": Response.status=RUN_HAS_RESULT, Response.result=COMPLIANT, Response.comment="dwc:countryCode is bdq:NotEmpty"]
[dwc:countryCode="": Response.status=RUN_HAS_RESULT, Response.result=NOT_COMPLIANT, Response.comment="dwc:countryCode is bdq:Empty"]
Source
References
Example Implementations (Mechanisms) FilteredPush:geo_ref_qc
Link to Specification Source Code geo_ref_qc DwCGeoRefDQ,validationCountrycodeNotempty()
Notes This test will return 'NOT_COMPLIANT' for records on the "High seas" where dwc:countryCode is bdq:Empty. We recommend that data from the high seas (outside national jurisdictions) use dwc:countryCode = "XZ" and dwc:country = "High seas" until an agreement has been made.
iDigBioBot commented 6 years ago

Comment by Lee Belbin (@Tasilee) migrated from spreadsheet: Added post scoring for consistency

cgendreau commented 6 years ago

This test should probably use the same word "EMPTY" as https://github.com/tdwg/bdq/issues/20 instead of NULL.

ArthurChapman commented 4 years ago

Need to add somewhere (Expected Response) a reference to ISO 3166. I have added a reference in the References.

Tasilee commented 4 years ago

Edited your comment (odd that you can) to 3166.

ArthurChapman commented 4 years ago

Thanks @Tasilee - was just about to make that correction.

Tasilee commented 4 years ago

Looking at this one again, we aren't checking for a valid dwc:countryCode, only that it is not EMPTY. A reference to ISO 3166 is fine, but isn't needed in Expected response.

tucotuco commented 4 years ago

Agreed.

Tasilee commented 1 year ago

Splitting bdqffdq:Information Elements into "Information Elements ActedUpon" and "Information Elements Consulted".

Also changed "Field" to "TestField", "Output Type" to "TestType" and updated "Specification Last Updated"

chicoreus commented 3 months ago

By including this test in CORE we are asserting that any data from the high seas is not fit for any of the use cases that include this test.

We need some specific recommendation for handling data from the High Seas. Country code, using the ISO list, should be empty for data from the high seas. This test needs some way to accomodate that to allow for data from the high seas being fit for use.

Tasilee commented 3 months ago

I agree @chicoreus: CORE suggests a universal use case. Does this raise the need for two other use cases - terrestrial and marine ecology?

We could set this test as Supplementary for terrestrial domain, and optionally generate an equivalent for the marine domain (using dwc:waterBody?).

As you suggest, we may be able to accommodate by trying to detect marine domains (dwc:waterBody, dwc:decimalLatitude and dwc:decimalLongitude, dwc:minimumDepthInMeters and dwc:maximumDepthInMeters....or ?). The simplest ER would be something like

"COMPLIANT if dwc:countryCode is NOT_EMPTY, or if any of wc:waterBody, dwc:minimumDepthInMeters or dwc:maximumDepthInMeters are NOT_EMPTY..."?

ArthurChapman commented 3 months ago

dwc:waterBody includes rivers and lakes etc. which are inside countries. I think we also decided sometime earlier, that waterbody in TGN was unworkable.

chicoreus commented 3 months ago

On Tue, 13 Aug 2024 15:57:56 -0700 Lee Belbin @.***> wrote:

"COMPLIANT if dwc:countryCode is NOT_EMPTY, or if any of wc:waterBody, dwc:minimumDepthInMeters or dwc:maximumDepthInMeters are NOT_EMPTY..."?

dwc:waterBody does not help us. Parts of the Atlantic Ocean are high seas, parts are within the EEZs of various countries, similarly for many marine water bodies at many heirarchical levels...

tucotuco commented 3 months ago

The UN/LOCODE system uses "XZ" to represent international waters or high seas. This is not an official ISO country code but is commonly used in logistics and transportation systems. I think this could be a good solution for covering fitness for use of data from the high seas.

tucotuco commented 3 months ago

Also, ZZ is an often used user-defined ISO code taken to mean "unknown". This would apply to situations where the location is unknown (i.e., not found or explicitly stated as unknown) as well as situations where the location is known, but can not be assigned to a single country code (e.g., "Argentina/Uruguay").

Tasilee commented 2 months ago

If we had dwc:decimalLatitude and dwc:decimal longitude, we may be able to use the shapefile download of country+EEZ at https://www.marineregions.org/downloads.php. We could set INTERNAL_PREREQUISITES_NOT_MET if we didn't have latitude and longitude. Just a long shot. If we can't do something like this, then I guess it is Immature/Incomplete until dwc:countryCode value of "XZ" becomes widely used?

ArthurChapman commented 2 months ago

I don't see a problem - we are not checking against Country Codes with this test - just checking if it has something in the field or not. Where we look at Standard etc. we could check against "country codes + XZ" and add a note about XZ

Tasilee commented 2 months ago

As this test now stands, I agree with @chicoreus in that we will be wrongfully returning NOT_COMPLIANT for any 'high seas' records. As this area is more than half of the planet, we need to take it seriously.

We therefore have three options

  1. Be aspirational in the use of dwc:countryCode="XZ", knowing that we will have many NOT_COMPLIANTs
  2. Use coordinates if available to test for "high seas". I am unaware of an API for this, but there are shapefiles for 'high seas' as mentioned. With this option, we must remain true to our 'easy to implement' criterion, to which I defer to @chicoreus and @tucotuco.
  3. Set the test to Immature/Incomplete, and promote "XZ".

I am slightly inclined to (3).

ArthurChapman commented 2 months ago

I disagree - this test - like all other tests for NOTEMPTY - is only checking if there is a value in that field - it makes no assumption on why it is empty. It is a simple YES/NO test.

ArthurChapman commented 2 months ago

@Tasilee - I think what you are saying applies to Tests #73 and #62- Not this test. In those tests I think they could be worded (especially #73) to include "or XZ ..." I'd have to look more closely to those two tests and possibly comment there rather than here.

chicoreus commented 2 months ago

@ArthurChapman, it is a concern here as well. For #98, if we aspirationally assert that High Seas should use "XZ" as the country code, then no problem, dwc:countryCode is expected to contain a value, and the absence of value indicates an absence of quality.

However, if we don't take that position, and take the current Darwin Core guidance, where dwc:countryCode is expected to be left empty for the High Seas, then all data from the High Seas will, according to this test, lack quality, because it lacks a value.

That's the nature of the problem for #98.

On Wed, 25 Sep 2024 15:12:07 -0700 Arthur Chapman @.***> wrote:

@Tasilee - I think what you are saying applies to Tests #73 and #62- Not this test. In those tests I think they could be worded (especially #73) to include "or XZ ..." I'd have to look more closely to those two tests and possibly comment there rather than here.

ArthurChapman commented 2 months ago

I still don't see a problem as it still a valuable test. Many datasets would not hold both terrestrial and marine data, and we have a separate test for terrestrial/marine (that would include high seas). There are many tests that one could argue won't add quality in every case - I think we discussed that with at least one other test. But in many datasets it would add quality knowing this. In the NOTEMPTY tests we are testing one simple thing. We then have other tests that test for other things, and we could, as @tucotuco suggested under #73 (https://github.com/tdwg/bdq/issues/73#issuecomment-2375426773), develop further tests for High Seas - my view is yes - we could do that - but lets leave that for after the Standard is published. Let's not continue adding and deleting tests at this stage.

tucotuco commented 2 months ago

I still don't see a problem as it still a valuable test. Many datasets would not hold both terrestrial and marine data, and we have a separate test for terrestrial/marine (that would include high seas). There are many tests that one could argue won't add quality in every case - I think we discussed that with at least one other test. But in many datasets it would add quality knowing this. In the NOTEMPTY tests we are testing one simple thing. We then have other tests that test for other things, and we could, as @tucotuco suggested under #73 (#73 (comment)), develop further tests for High Seas - my view is yes - we could do that - but lets leave that for after the Standard is published. Let's not continue adding and deleting tests at this stage.

That is an easy posture to get behind at this point!

chicoreus commented 2 months ago

I am very reluctant to release a suite of core tests that will assert that a very large portion of the world's marine data is unfit for use (for several Use Cases, spatial-temporal in particular). This is what VALIDATION_COUNTRYCODE_NOTEMPTY is guananteed to do, with no path to resolving that, unless we assert that High Seas should use the value "XZ" for dwc:countryCode. This gives a path to data having quality.

There is a fundamental problem that we have to solve here. Otherwise the test suite is not itself usable. There are multiple possible solutions. The simplest is to assert that high seas data should use XZ for the country code. The second simplest is to exclude this test from core, but this doesn't resolve the issue for the other country code tests...

We are realizing a problem late in the game, but it is one we must resolve.

ArthurChapman commented 2 months ago

By saying that the COUNTRYCODE is EMPTY does not say that the data is Not fit for use. It depends on the use and the user has to make that decision. Anyone working in the marine area knows that marine data would not have a Country Code. There are so many other tests that test for NOTEMPTY - by saying they are EMPTY does not make then not fit for use. KINGDOM_NOTEMPTY, GEODETICDATUM_NOTEMPTY, EVENTDATE_NOTEMPTY. There are many other tests that return NOT_COMPLIANT that don't make the data NOT FIT FOR USE for many uses

Don't read too much into what each of the tests are doing and not doing. The EMPTY/NOTEMPTY tests are just that! There is, or there is not something in the field. Other tests then do the next stages. Because we don't have a workflow and the tests are stand alone, means that in many cases that test alone won't tell you if the data is fit for your use. If we had a workflow order, you may do MARINETERRESTRIAL test first and then only run this test on Terrestrial data, but we don't do that.

I don't see that there is anything to resolve. If we make a change here, then we have to revisit nearly every other test, because similar arguments could be made for many of the tests.

ArthurChapman commented 2 months ago

@Tasilee wrote: "There is a fundamental problem that we have to solve here. Otherwise the test suite is not itself usable. There are multiple possible solutions. The simplest is to assert that high seas data should use XZ for the country code."

Put in the notes that "This test will return 'NOT_COMPLIANT' for records in the "High Seas". We recommend that high seas data use the dwc:countryCode = XZ". I would strongly oppose moving this and similar tests out of CORE.

chicoreus commented 2 months ago

On Wed, 25 Sep 2024 17:30:53 -0700 Arthur Chapman @.***> wrote:

By saying that the COUNTRYCODE is EMPTY does not say that the data is Not fit for use.

That is exactly the semantics of NOT_COMPLIANT.

It depends on the use and the user has to make that decision.

The user is free to compose their own use cases. We are asserting that Spatial-Temporal Patterns is a use case. VALIDATION_COUNTRYCODE_NOTEMPTY asserts NOT_COMPLIANT if dwc:countryCode is bdq:Empty. This means that any SingleRecord for which dwc:countryCode is bdq:Empty is not fit for use for that use case. This is exactly the sematnics of the test and the use case.

No Marine data from the high seas are fit for use for Spatial-Temporal Patterns, (unless they incorrectly contain a country code).

Quality Assurance will exclude any data that is NOT_COMPLIANT for any validation in the use case. That is a fundamental of the framework.

We can't get around that by saying that users can compose tests in ways they like. We are asserting a use case (as the framework requires us to).

This is not a problem we can avoid. We must solve it. We can't claim it doesn't exist. We must solve it.

chicoreus commented 2 months ago

On Wed, 25 Sep 2024 17:35:34 -0700 Arthur Chapman @.***> wrote:

Put in the notes that "This test will return 'NOT_COMPLIANT' for records in the "High Seas". We recommend that high seas data use the dwc:countryCode = XZ".

This is a workable solution.

I would strongly oppose moving this and similar tests out of CORE.

Likewise. This test has value (particularly with requirements for documenting origin of material under the convention on biological diversity).

Tasilee commented 2 months ago

I made a change to the Expected Response from

COMPLIANT if dwc:countryCode is bdq:NotEmpty; otherwise NOT_COMPLIANT

to

COMPLIANT if dwc:countryCode is bdq:NotEmpty or has a value of "XZ"; otherwise NOT_COMPLIANT

and updated the Notes to

This test will return 'NOT_COMPLIANT' for records on the "High seas" where dwc:countryCode is bdq:Empty. We recommend that data from the high seas (outside national jurisdictions) use dwc:countryCode = "XZ" and dwc:country = "High seas" until an agreement has been made.

chicoreus commented 2 weeks ago

Changing the expected response back to:

COMPLIANT if dwc:countryCode is bdq:NotEmpty; otherwise NOT_COMPLIANT

dwc:countryCode = XZ is bdq:notEmpty, so there is no reason for the specification to assert "COMPLIANT if dwc:countryCode is bdq:NotEmpty or if dwc:countryCode is bdq:NotEmpty; otherwise NOT_COMPLIANT"

Callout of XZ in the notes is good, but the statement in the expected response is redundant and confusing.