tdwg / bdq

Biodiversity Data Quality (BDQ) Interest Group
https://github.com/tdwg/bdq
43 stars 7 forks source link

TG2-VALIDATION_COORDINATES_NOTZERO #87

Open iDigBioBot opened 6 years ago

iDigBioBot commented 6 years ago
TestField Value
GUID 1bf0e210-6792-4128-b8cc-ab6828aa4871
Label VALIDATION_COORDINATES_NOTZERO
Description Are the values of either dwc:decimalLatitude or dwc:decimalLongitude numbers that are not equal to 0?
TestType Validation
Darwin Core Class dcterms:Location
Information Elements ActedUpon dwc:decimalLatitude
dwc:decimalLongitude
Information Elements Consulted
Expected Response INTERNAL_PREREQUISITES_NOT_MET if dwc:decimalLatitude is bdq:Empty or is not interpretable as a number, or dwc:decimalLongitude is bdq:Empty or is not interpretable as a number; COMPLIANT if either the value of dwc:decimalLatitude is not = 0 or the value of dwc:decimalLongitude is not = 0; otherwise NOT_COMPLIANT
Data Quality Dimension Likeliness
Term-Actions COORDINATES_NOTZERO
Parameter(s)
Source Authority
Specification Last Updated 2023-06-20
Examples [dwc:decimalLatitude="21.0534", dwc:decimalLongitude="81.0554": Response.status=RUN_HAS_RESULT, Response.result=COMPLIANT, Response.comment="dwc:decimalLatitude and dwc:decimalLongitude are not zero"]
[dwc:decimalLatitude="0", dwc:decimalLongitude="0",: Response.status=RUN_HAS_RESULT, Response.result=NOT_COMPLIANT, Response.comment="dwc:decimalLatitude and dwc:decimalLongitude are zero"]
Source ALA, GBIF, OBIS
References
Example Implementations (Mechanisms)
Link to Specification Source Code
Notes A record with 0.0 is interpreted as the string "0"
iDigBioBot commented 6 years ago

Comment by Lee Belbin (@Tasilee) migrated from spreadsheet: Suggest we split this into two tests

ArthurChapman commented 6 years ago

Likeliness in Data Quality Dimension changed to Likelihood

tucotuco commented 6 years ago

Agreed at TDWG 2018 DQIG meeting that the name TG2-VALIDATION_COORDINATES_ZERO is satisfactory.

tucotuco commented 4 years ago

I would make a modification to this one to avoid one particular false trigger of a failed validation. I would replace the Expected Response

"INTERNAL_PREREQUISITES_NOT_MET if dwc:decimalLatitude and/or dwc:decimalLongitude are EMPTY or both of the values are not interpretable as numbers; COMPLIANT if either the value of dwc:decimalLatitude is not = 0 or the value of dwc:decimalLongitude is not = 0; otherwise NOT_COMPLIANT"

with

"INTERNAL_PREREQUISITES_NOT_MET if dwc:decimalLatitude or dwc:decimalLongitude is EMPTY or both of the values are not interpretable as numbers; COMPLIANT if either the numeric value of dwc:decimalLatitude is not = 0 or the numeric value of dwc:decimalLongitude is not = 0 or if (the numeric values of both coordinates are equal to 0 and (the dwc:coordinateUncertaintyInMeters can be interpreted as a number and the numeric value of dwc:coordinateUncertaintyInMeters is >= 1) or (the dwc:coordinatePrecision can be interpreted as a number and the numeric value of dwc:coordinatePrecision is not = 0)); otherwise NOT_COMPLIANT"

To the Information Elements I would add dwc:coordinateUncertaintyInMeters and dwc:coordinatePrecision.

I would change the Examples from

dwc:decimalLatitude="0", dwc:decimalLongitude="0"

to

dwc:decimalLatitude="0", dwc:decimalLongitude="0", dwc:coordinateUncertaintyInMeters = "20037509"

To the Notes I would add

"Valid values of uncertainty or precision can indicate real occurrences at the geographic coordinates 0, 0. A georeference indicating that the location is only known to be from Earth would likely have coordinates 0,0 and coordinateUncertaintyInMeters equal to half the equatorial circumference."

ArthurChapman commented 4 years ago

It would be interesting to know how many true recordings there are at exactly 0.000000, 0.000000 in the middle of the ocean. I would expect it is very low, if not none. Is it worth making the test a lot more complicated so that you don't flag those few - rather than flag them anyway and if someone is interested in that area, checking those few?

chicoreus commented 4 years ago

@tucotuco not sure I understand the change, it doesn't seem to agree with the notes, which imply only that a coordinate uncertainty equal to half the equatorial curcumerence is an allowed 0,0 value. As stated, 0,0 without both coordinate uncertainty and coordinate precision is flagged, but any value in either uncertainty or precision makes 0,0 compliant. That doesn't make sense to me as I expect the number of error cases where a coordinate uncertainty was given but latitude and longitude weren't would be much much larger than the number of cases of 0,0 that are real observations.

I'd much rather leave out the edge case, leave the specification as is, and flag any case where latitude and longitude are zero.

tucotuco commented 4 years ago

I capitulate. :-)

On Wed, Apr 8, 2020 at 11:04 PM Paul J. Morris notifications@github.com wrote:

@tucotuco https://github.com/tucotuco not sure I understand the change, it doesn't seem to agree with the notes, which imply only that a coordinate uncertainty equal to half the equatorial curcumerence is an allowed 0,0 value. As stated, 0,0 without both coordinate uncertainty and coordinate precision is flagged, but any value in either uncertainty or precision makes 0,0 compliant. That doesn't make sense to me as I expect the number of error cases where a coordinate uncertainty was given but latitude and longitude weren't would be much much larger than the number of cases of 0,0 that are real observations.

I'd much rather leave out the edge case, leave the specification as is, and flag any case where latitude and longitude are zero.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tdwg/bdq/issues/87#issuecomment-611286241, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADQ724BLKKYORG3NHMPHDTRLUULDANCNFSM4EKSOW3A .

Tasilee commented 4 years ago

I tend to agree with @ArthurChapman. The test would make more sense to me if EITHER dwc:decimalLatitude or dwc:decimalLongitude were zero. It just makes it a more useful test as the lat=lon=0 is going to be rarer.

At one stage, I chased up a suite of what were badly processed records in the ALA where you got a 45 degree line heading southwest from 0,0.

The test name would still make sense if we did the OR approach. My ongoing philosophy of being ok with some false positives still holds.

tucotuco commented 4 years ago

I don't like telling people their data are wrong when they are not, not matter how many there are, especially because the test would continue to tell them the same thing every time. That would annoy me to know end if I was trying to use the test to improve my data. I have hear this sentiment among the folks we deal with, and that is one of the reasons they like VertNet so much - we don't continue to pester unnecessarily.

I checked in my GBIF snapshot from a year ago. There are 68373 occurrences with one or the other zero, but not both. Of these, it looks like about 75% are real with the zero, and the rest are errors.

ArthurChapman commented 4 years ago

I am not surprised that there are many records with one of Latitude or Longitude as Zero - many of these are terrestrial and even the marine ones could be good. Where both are 0 - there are not many, if any, that are valid records. Many have arisen where the data is EMPTY and certain databases converted the NULL value to 0. From memory (I could be wrong) but the old Advanced Revelation database software (used in South Africa at one stage) converted Null values to 0. I think that GBIF may be removing the 0,0 records - hence you getting no records.

I would not touch records where one of Latitude or Longitude are 0. But where both are 0 we should identify.

tucotuco commented 4 years ago

My data are based on my copy of all GBIF. Nothing is filtered, so there aren't any missing. But good, when both zero, trigger.

On Thu, Apr 9, 2020 at 6:29 PM Arthur Chapman notifications@github.com wrote:

I am not surprised that there are many records with one of Latitude or Longitude as Zero - many of these are terrestrial and even the marine ones could be good. Where both are 0 - there are not many, if any, that are valid records. Many have arisen where the data is EMPTY and certain databases converted the NULL value to 0. From memory (I could be wrong) but the old Advanced Revelation database software (used in South Africa at one stage) converted Null values to 0. I think that GBIF may be removing the 0,0 records - hence you getting no records.

I would not touch records where one of Latitude or Longitude are 0. But where both are 0 we should identify.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tdwg/bdq/issues/87#issuecomment-611761825, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADQ725CYVPP2VCWMN2KDLDRLY45FANCNFSM4EKSOW3A .

Tasilee commented 4 years ago

As usual, I bow to the experts.

I found 22 records in ALA of lat/lon=0,0. These are rendered as 'spatially invalid' on test 'coordinates don't match country (error)' and also warnings on lat=0, long=0, lat=long=0.

tucotuco commented 2 years ago

Suggest Description:

'Are the values of either dwc:decimalLatitude or dwc:decimalLongitude numbers that are not equal to 0?'

in place of:

'Are the values of either dwc:decimalLatitude or dwc:decimalLongitude numbers that are not = 0?'

chicoreus commented 1 year ago

Specification is inconsistent with dataID 707 in the validation data, which has data values indicative of a phrasing:

INTERNAL_PREREQUISITES_NOT_MET if dwc:decimalLatitude and/or dwc:decimalLongitude are EMPTY or either of the values are not interpretable as numbers;

As we are trying to isolate just 0,0 coordinates with a Response.result of NOT_COMPLIANT in this test, we probably do wan to change the specification to use either instead of both.

ArthurChapman commented 1 year ago

In this test we are trying to exclude 0, 0 - Not 0, 145.7 etc. @chicoreus - your wording above is saying the that 0 in either latitude or longitude is empty, but this wasn't what was intended originally. There is a greater likelihood that 0, 147.5 is a good record than 0,0.

Tasilee commented 1 year ago

This was a test for lat/lon 0,0 so reflecting it to the 'positive' probably stuffed the logic. The intent is

INTERNAL_PREREQUISITES_NOT_MET if dwc:decimalLatitude or dwc:decimalLongitude are EMPTY or not interpretable as numbers; COMPLIANT if the value of dwc:decimalLatitude and dwc:decimalLongitude are not zero; otherwise NOT_COMPLIANT

This makes DataID 707 COMPLIANT

chicoreus commented 1 year ago

@ArthurChapman I'm confusing things by just changing one clause. The intent is indeed that 0,26.445 is COMPLIANT, the question is how to handle 0,"foo", is that COMPLIANT (because it is 0 something, unlike the logic for the main clause), or should we explicitly exclude it as the "foo" might be other than zero. Thus in full:

INTERNAL_PREREQUISITES_NOT_MET if dwc:decimalLatitude and/or dwc:decimalLongitude are EMPTY or either of the values are not interpretable as numbers; COMPLIANT if either the value of dwc:decimalLatitude is not = 0 or the value of dwc:decimalLongitude is not = 0; otherwise NOT_COMPLIANT

These cases are the same using "both" or "either"

dwc:decimalLatitude dwc:decimalLongitude Response.status Response.value
0 0 RUN_HAS_RESULT NOT_COMPLIANT
A B INTERNAL_PREREQUISITES_NOT_MET
1.45 0 RUN_HAS_RESULT COMPLIANT
6.4255 35.634 RUN_HAS_RESULT COMPLIANT
0 35.634 RUN_HAS_RESULT COMPLIANT

These cases differ:

Both:

dwc:decimalLatitude dwc:decimalLongitude Response.status Response.value
1.45 A RUN_HAS_RESULT COMPLIANT
Foo 0 RUN_HAS_RESULT COMPLIANT

Either:

dwc:decimalLatitude dwc:decimalLongitude Response.status Response.value
1.45 A INTERNAL_PREREQUISITES_NOT_MET
Foo 0 INTERNAL_PREREQUISITES_NOT_MET

If we use "both" in the INTERNAL_PREREQUISITES_NOT_MET clause, then both decimalLatitude and decimalLongitude must be non-numeric for the INTERNAL_PREREQUISITES_NOT_MET to be met, otherwise, we pass on to the compliant/non compliant clauses and ask if both values are zero, if both are then NOT_COMPLIANT

If we use "either" in the INTERNAL_PREREQUISITES_NOT_MET clause, then a non-numeric value in either decimalLatitude or decimalLongitude prevents us from being able to tell if the asserted coordinate is 0,0, and we assert that we can't run the test instead.

chicoreus commented 1 year ago

@Tasilee "INTERNAL_PREREQUISITES_NOT_MET if dwc:decimalLatitude or dwc:decimalLongitude are EMPTY or not interpretable as numbers; " isn't explicit about what happens when one of dwc:decimalLatitude or dwc:decimalLongitude is not interpretable as numbers. I could implement either way from that phrasing, but, given the "or' inn the begning of the clause, I would tend to say that it carries on to the second part of the clause meaning "INTERNAL_PREREQUISITES_NOT_MET if dwc:decimalLatitude or dwc:decimalLongitude are EMPTY or either is not interpretable as a number; "

Tasilee commented 1 year ago

So, what you are suggesting for the ER is-

INTERNAL_PREREQUISITES_NOT_MET if dwc:decimalLatitude or dwc:decimalLongitude are EMPTY or either value is not interpretable as a number; COMPLIANT if either the value of dwc:decimalLatitude is not = 0 or the value of dwc:decimalLongitude is not = 0; otherwise NOT_COMPLIANT

?

chicoreus commented 1 year ago

@Tasilee in essence, yes. The specific proposal in https://github.com/tdwg/bdq/issues/87#issuecomment-1596320397 is:

INTERNAL_PREREQUISITES_NOT_MET if dwc:decimalLatitude and/or dwc:decimalLongitude are EMPTY or either of the values are not interpretable as numbers; COMPLIANT if either the value of dwc:decimalLatitude is not = 0 or the value of dwc:decimalLongitude is not = 0; otherwise NOT_COMPLIANT

Tasilee commented 1 year ago

Sorry to be pedantic, but surely you only need any one of the input values to be EMPTY to trigger INTERNAL_PREREQUISITES_NOT_MET? As in

INTERNAL_PREREQUISITES_NOT_MET if dwc:decimalLatitude or dwc:decimalLongitude are EMPTY or either of the values are not interpretable as numbers; COMPLIANT if either the value of dwc:decimalLatitude is not = 0 or the value of dwc:decimalLongitude is not = 0; otherwise NOT_COMPLIANT

chicoreus commented 1 year ago

@Tasilee pedantic is good. Or could be interpreted as an exclusive or (where one being empty satisfies the condition, but both being empty does not). We could be more explicit by adding the phrase "at least one of"

INTERNAL_PREREQUISITES_NOT_MET if at least one of dwc:decimalLatitude or dwc:decimalLongitude are EMPTY or at least one of either of the values are not interpretable as numbers; COMPLIANT if either the value of dwc:decimalLatitude is not = 0 or the value of dwc:decimalLongitude is not = 0; otherwise NOT_COMPLIANT

Tasilee commented 1 year ago

Thanks @chicoreus - I can live with that.

ArthurChapman commented 1 year ago

Updated INTERNAL_PREREQUISITES_NOT_MET in the Expected Response in line with discussion on #43 and updated Specification Last Updated. Removed NEEDS WORK

Tasilee commented 12 months ago

Splitting bdqffdq:Information Elements into "Information Elements ActedUpon" and "Information Elements Consulted".

Also changed "Field" to "TestField", "Output Type" to "TestType" and updated "Specification Last Updated"