Closed ArthurChapman closed 9 months ago
TestField | Value |
---|---|
GUID | 32aca770-1f99-45f1-87a4-f4a582c02b50 |
Label | ISSUE_COORDINATEPRECISION_UNLIKELY |
Description | Is the value of dwc:coordinatePrecision a likely value ? |
TestType | Issue |
Darwin Core Class | Location |
Information Elements ActedUpon | dwc:coordinatePrecision |
Information Elements Consulted | |
Expected Response | EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if dwc:coordinatePrecision is bdq:Empty; POTENTIAL_ISSUE if the value of dwc:coordinatePrecision is not in the bdq:sourceAuthority; otherwise NOT_ISSUE. |
Data Quality Dimension | Likelihood |
Term-Actions | COORDINATEPRECISION_UNLIKELY |
Parameter(s) | bdq:sourceAuthority |
Source Authority | bdq:sourceAuthority default = "Darwin coordinatePrecision" {[http://rs.tdwg.org/dwc/terms/coordinatePrecision]} {dwc:coordinatePrecision vocabulary API [NO CURRENT API EXISTS]} |
Specification Last Updated | 2024-02-13 |
Examples | [dwc:coordinatePrecision="15": Response.status=RUN_HAS_RESULT, Response.result=POTENTIAL_ISSUE, Response.comment="dwc:coordinatePrecision does not have an equivalent in the bdq:sourceAuthority"] |
[dwc:individualCount="0.01667": Response.status=RUN_HAS_RESULT, Response.result=NOT_ISSUE, Response.comment="dwc:coordinatePrecision has an equivalent in the bdq:sourceAuthority bdq:sourceAuthority"] | |
Source | TG2 |
References | |
Example Implementations (Mechanisms) | |
Link to Specification Source Code | |
Notes | Zero, for example, is not a valid value for dwc:coordinatePrecision. Neither are most real numbers between 0 and 1 (e.g., 0.2 to be an unlikely value for dwc:coordinatePrecision, because no one would use nearest fifths of a degree. This bdq:Supplementary test is not regarded as CORE (cf. bdq:CORE) because of one or more of the reasons: not being widely applicable; not informative; not straightforward to implement or likely to return a high percentage of either bdq:COMPLIANT or bdq:NOT_COMPLIANT results (cf bdq:Response.result). A Supplementary test may be implemented as CORE when a suitable use case exists. |
I have created this test as a "STANDARD" test as it conforms with similar tests that are tested against a bdq:sourceAuthority. Perhaps this should be CORE, but can't be until we have a Lookup Table that we can link to.
@ArthurChapman this feels like it should be an INRANGE test rather than a STANDARD test. The definition of dwc:coordinatePrecision is " A decimal representation of the precision of the coordinates given in the dwc:decimalLatitude and dwc:decimalLongitude." This indicates an arbitrary positive real number, with some upper limit (my math isn't good enough to be sure, but I'm guessing that a precision of 360.0 with any coordinate implies a location anywhere on the surface of the earth). Examples start at a precision of 1.0 and get smaller than there, but there are probably reasonable values for precision when the location is known to a resolution of more than one degree.
The value is an arbitrary real number, so this is a test of the value against a range, rather than a test against a vocabulary, thus no vocabulary needed. There are standard values for precision when translating from one form of coordinates to another, but they aren't the only possibilities.
Propose:
INTERNAL_PREREQUISITES_NOT_MET if dwc:latitude is NOT_EMPTY and dwc:coordinatePrecision is EMPTY; COMPLIANT if dwc:latitude is EMPTY or if the value of dwc:coordinatePrecision is a positive real number less than or equal to 360. otherwise NOT_COMPLIANT.
We need to not assert that data are not fit for purpose by asserting INTERNAL_PREREQUISITES_NOT_MET for an empty value when an empty value is a correct value for the situation, that is when when no value exists as the metadata term is expected to be empty as there is no georeference. The absence of a georeference is a different data quality problem assessed by other tests.
And not parameterized, and no source authority.
This might belong in CORE. Good georeference metadata is important for analysis of georeferences, though coordinate precision may be more important for downstream presentation than analytical purposes. It provides a means of asserting how many decimal places to display (and how many are relevant for analysis) when numeric coordinate data are serialized into strings and deserialized, potentially at several steps in the pathway, any of which could add arbitrary numbers of trailing zeroes, or alter low significance digits as the result of moving between string serializations and floating point numbers.
An INRANGE version of this test is simpler, but it would be much less useful. The STANDARD version of the test can tell people if the precision term was very likely misunderstood.
In an INRANGE version I would say that the valid range is 0.0000001 < coordinatePrecision < 180. That would cover a proper georeference on the low end and a quadrant of the globe on the high end.
@tucotuco I can see that some values would be expected to be standard (e.g. 0.000278 for translation from degrees, minutes, seconds with a precision of 1 second to decimal degrees), but there are a large number of potential sources of original coordinate data (PLSS, state plane feet, OSGB coordinates, etc), and affects of coordinate transformations that should in effect mean that the precision is an arbitrary value, not constrainable by a vocabulary. I can't see a clear way of distinguishing between the misunderstandings of the term and valid precision values produced from a range of different original forms with various transformations into decimal degrees. Perhaps an ISSUE that flags cases where the precision is not one of the expected set of values from a small set of typical (decimal degrees to n digits of precision, decimal degrees from degrees/minutes, decimal degrees from degrees/minutes and tenth of a minute, decimal degrees from degrees minutes seconds to one second), but not a validation.
I can see ISSUE as a more appropriate test than STANDARD. A vocabulary would have issues with the values being strings to represent the numbers, and with the precision of the values representing the precision. An ISSUE test would be much more useful than an INRANGE test.
I've changed to an ISSUE test - @tucotuco do you still see it having a Lookup Table? Otherwise the Expected Response would be difficult to write.
@tucotuco "A vocabulary would have issues with the values being strings to represent the numbers, and with the precision of the values representing the precision." I'm not sure what you mean here. Can you provide some examples of what an issue would assert is correct and not correct based on a controlled vocabulary?
@tucotuco it seems likely that I'm misunderstanding the nature of the problem and what this test is intended to detect...
@chicoreus - you have changed to ISSUE-COORDINATEPRECISION_STANDARD - I was suggesting ISSUE_COORDINATEPRECISION_LIKELY
@ArthurChapman missed the likely.... Fixed.
A lookup table would be required, but implementation would likely need more than that because of the precision of the representation of the precision. For example, "0.000278" and "0.0002778" are both likely values, but is "0.00028"? Or "0.0003"? The perfectly precise values of the likely values can not even be expressed as numbers with finite precision.
Thanks @chicoreus - Standard implies a Conformance test whereas Likely implies a Likelihood Test
OK - for now if we leave it as a sourceAuthority and say that NO CURRENT API EXISTS - we can label it ?"Incomplete" and NEEDS WORK. Once we have a LookUp Table we can change to either Supplementary of CORE.
So we need to be concerned with the precision of the precision...
This feels like a test (or perhaps that is a related one) that needs to take the verbatim coordinate as an information element consulted, assess whether the verbatim coordinate is decimal degrees, degrees decimal minutes, or degrees minutes seconds, and if so test to see if the precision is a reasonable value for one of those. A very large number of cases can probably be covered with a small number of likely ranges that can be specified within the specification of the test and don't need an external vocabulary. If we exclude transformations from other coordinate systems from consideration, there are probably only a small number of cases to consider. If we include transformations from other coordinate systems we are likely into the place where we just have to ask if arbitrary values are in range. Assuming, of course, that I'm understanding the problem, something I'm not yet convinced of.
@ArthurChapman @tucotuco I'm just not seeing the need for a source authority. This very much feels like a likely set of values could be enumerated in the test, with an extension point for provided by a parameter that allows the addition of a list of other parameters (e.g. original data were likely to have included PLSS data (thus precise to section, half section, quarter section, quarter quarter section, etc)).
@chicoreus - I don't see why not a soucreAuthority. If you hard wire a set of values in the test itself, it is a big job to change later and would require a lengthy process (through the TDWG system), but if you use a sourceAuthority with a list of likely values, it is easy to add new ones as they arise and without having to go through a lengthy and difficult process. I think it is the simplest solution.
This is also a place we could explore the extension point in the response for representing uncertainty, particular values between 180 and 1 are possible {known to quadrant, known to 10 degrees, etc ), but less likely than values between 0.00001 and 1 inclusive, where particular values in that range have high likelyood, a value of 0.000001 is unlikely but not totally implausible, and values less than 0.000001 are implasuble. Thus issue could assert potential issue with a qualifier of uncertainty of the issue.
@ArthurChapman it isn't taxon names or values for sex, it is math. I'm not very comfortable with a controlled vocabulary for mathematical values. A parameter would readily allow other cases without a change to the test specifications.
@chicoreus We define bdq:sourceAuthority as "namespace that provides a reference for values required for a test evaluation" To me it makes no difference if that list of values is an alphabetical list of values or a numerical list - especially if that numeric list is a list of discreet values.
@chicoreus - bdq:sourceAuthority is not the same as a controlled Vocabulary, even though many of the sourceAuthority are controlled vocabularies.
uhm ... maybe replace the individualCount examples with coordinatePrecision?
Thanks @ymgan - My mistake - done.
The lengthy discussion on this 'test' strongly suggests taking the simpler of the two strategies (range and likely values), and use ISSUE_COORDINATE_PRECISION_LIKELY with a range of 0.00001 to 1.0 being LIKELY and setting the status to Immature/Incomplete. We can Note a potential implementation of a Source Authority list of likely values.
If @tucotuco believes that a SourceAuthority is the best way to go with this test and that he believes that he could create one when needed - I'd go that way. It may be years before anyone decides to take it further. I'd suggest we label this test Immature/Incomplete at this stage.
A useful implementation would require a SourceAuthority of values combined with an algorithm to determine if a value is "close enough" to one of those values. Given the requirement of an algorithm, all of it could be done in code without a SourceAuthority. In any case, the range implementation would be of much less utility, only catching if the value is larger than or smaller than expected.
Fixing name and label to reflect this being an issue rather than a validation.
@chicoreus - We don't have any tests expressed in the negative - we had years of discussion on this. You've changed this to negative! To fit with ALL other tests - it should be called ISSUE_COORDINATEPRECISION_LIKELY. The Expected Response stays as is.
On Thu, 22 Feb 2024 12:47:47 -0800 Arthur Chapman @.***> wrote:
@chicoreus - We don't have any tests expressed in the negative - we had years of discussion on this. You've changed this to negative! To fit with ALL other tests - it should be called ISSUE_COORDINATEPRECISION_LIKELY. The Expected Response stays as is.
That's because we are almost always using Validations, which are positive assertions about data having quality. Issues are different, all Issues should be expressed in the negative as they are raising potential problems.
In that case, we will need to change #29, #72, #94
I should rephrase, all issues should be phrased to point out what the issue is, that is what is the thing that is a problem.
The following are correct:
ISSUE_COORDINATES_OUTSIDEEXPERTRANGE #292 (fixed from IN to OUTSIDE)
ISSUE_OUTLIER_DETECTED #291
ISSUE_COORDINATES_CENTEROFCOUNTRY #287
ISSUE_ESTABLISHMENTMEANS_NOTEMPTY #94 (correctly phrased, if there is a value in the term, then the data may lack quality for what organisms occur when uses, (but have value for studies of introduction of taxa)).
ISSUE_DATAGENERALIZATIONS_NOTEMPTY #72 (likewise, if there is a value in dataGeneralizations, then there may be a quality issue for use of the data (depending on the need for precision and the value of the dataGeneralizations))
ISSUE_ANNOTATION_NOTEMPTY #29 (likewise correctly phrases, if an annotation exists it might point to a data quality concern).