ArthurChapman commented 9 months ago

TestField	Value
GUID	32aca770-1f99-45f1-87a4-f4a582c02b50
Label	ISSUE_COORDINATEPRECISION_UNLIKELY
Description	Is the value of dwc:coordinatePrecision a likely value ?
TestType	Issue
Darwin Core Class	Location
Information Elements ActedUpon	dwc:coordinatePrecision
Information Elements Consulted
Expected Response	EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if dwc:coordinatePrecision is bdq:Empty; POTENTIAL_ISSUE if the value of dwc:coordinatePrecision is not in the bdq:sourceAuthority; otherwise NOT_ISSUE.
Data Quality Dimension	Likelihood
Term-Actions	COORDINATEPRECISION_UNLIKELY
Parameter(s)	bdq:sourceAuthority
Source Authority	bdq:sourceAuthority default = "Darwin coordinatePrecision" {[http://rs.tdwg.org/dwc/terms/coordinatePrecision]} {dwc:coordinatePrecision vocabulary API [NO CURRENT API EXISTS]}
Specification Last Updated	2024-02-13
Examples	[dwc:coordinatePrecision="15": Response.status=RUN_HAS_RESULT, Response.result=POTENTIAL_ISSUE, Response.comment="dwc:coordinatePrecision does not have an equivalent in the bdq:sourceAuthority"]
	[dwc:individualCount="0.01667": Response.status=RUN_HAS_RESULT, Response.result=NOT_ISSUE, Response.comment="dwc:coordinatePrecision has an equivalent in the bdq:sourceAuthority bdq:sourceAuthority"]
Source	TG2
References
Example Implementations (Mechanisms)
Link to Specification Source Code
Notes	Zero, for example, is not a valid value for dwc:coordinatePrecision. Neither are most real numbers between 0 and 1 (e.g., 0.2 to be an unlikely value for dwc:coordinatePrecision, because no one would use nearest fifths of a degree. This bdq:Supplementary test is not regarded as CORE (cf. bdq:CORE) because of one or more of the reasons: not being widely applicable; not informative; not straightforward to implement or likely to return a high percentage of either bdq:COMPLIANT or bdq:NOT_COMPLIANT results (cf bdq:Response.result). A Supplementary test may be implemented as CORE when a suitable use case exists.

ArthurChapman commented 9 months ago

I have created this test as a "STANDARD" test as it conforms with similar tests that are tested against a bdq:sourceAuthority. Perhaps this should be CORE, but can't be until we have a Lookup Table that we can link to.

chicoreus commented 9 months ago

@ArthurChapman this feels like it should be an INRANGE test rather than a STANDARD test. The definition of dwc:coordinatePrecision is " A decimal representation of the precision of the coordinates given in the dwc:decimalLatitude and dwc:decimalLongitude." This indicates an arbitrary positive real number, with some upper limit (my math isn't good enough to be sure, but I'm guessing that a precision of 360.0 with any coordinate implies a location anywhere on the surface of the earth). Examples start at a precision of 1.0 and get smaller than there, but there are probably reasonable values for precision when the location is known to a resolution of more than one degree.

The value is an arbitrary real number, so this is a test of the value against a range, rather than a test against a vocabulary, thus no vocabulary needed. There are standard values for precision when translating from one form of coordinates to another, but they aren't the only possibilities.

chicoreus commented 9 months ago

Propose:

INTERNAL_PREREQUISITES_NOT_MET if dwc:latitude is NOT_EMPTY and dwc:coordinatePrecision is EMPTY; COMPLIANT if dwc:latitude is EMPTY or if the value of dwc:coordinatePrecision is a positive real number less than or equal to 360. otherwise NOT_COMPLIANT.

We need to not assert that data are not fit for purpose by asserting INTERNAL_PREREQUISITES_NOT_MET for an empty value when an empty value is a correct value for the situation, that is when when no value exists as the metadata term is expected to be empty as there is no georeference. The absence of a georeference is a different data quality problem assessed by other tests.

chicoreus commented 9 months ago

And not parameterized, and no source authority.

chicoreus commented 9 months ago

This might belong in CORE. Good georeference metadata is important for analysis of georeferences, though coordinate precision may be more important for downstream presentation than analytical purposes. It provides a means of asserting how many decimal places to display (and how many are relevant for analysis) when numeric coordinate data are serialized into strings and deserialized, potentially at several steps in the pathway, any of which could add arbitrary numbers of trailing zeroes, or alter low significance digits as the result of moving between string serializations and floating point numbers.

tucotuco commented 9 months ago

An INRANGE version of this test is simpler, but it would be much less useful. The STANDARD version of the test can tell people if the precision term was very likely misunderstood.

In an INRANGE version I would say that the valid range is 0.0000001 < coordinatePrecision < 180. That would cover a proper georeference on the low end and a quadrant of the globe on the high end.

chicoreus commented 9 months ago

@tucotuco I can see that some values would be expected to be standard (e.g. 0.000278 for translation from degrees, minutes, seconds with a precision of 1 second to decimal degrees), but there are a large number of potential sources of original coordinate data (PLSS, state plane feet, OSGB coordinates, etc), and affects of coordinate transformations that should in effect mean that the precision is an arbitrary value, not constrainable by a vocabulary. I can't see a clear way of distinguishing between the misunderstandings of the term and valid precision values produced from a range of different original forms with various transformations into decimal degrees. Perhaps an ISSUE that flags cases where the precision is not one of the expected set of values from a small set of typical (decimal degrees to n digits of precision, decimal degrees from degrees/minutes, decimal degrees from degrees/minutes and tenth of a minute, decimal degrees from degrees minutes seconds to one second), but not a validation.

tucotuco commented 9 months ago

I can see ISSUE as a more appropriate test than STANDARD. A vocabulary would have issues with the values being strings to represent the numbers, and with the precision of the values representing the precision. An ISSUE test would be much more useful than an INRANGE test.

ArthurChapman commented 9 months ago

I've changed to an ISSUE test - @tucotuco do you still see it having a Lookup Table? Otherwise the Expected Response would be difficult to write.

chicoreus commented 9 months ago

@tucotuco "A vocabulary would have issues with the values being strings to represent the numbers, and with the precision of the values representing the precision." I'm not sure what you mean here. Can you provide some examples of what an issue would assert is correct and not correct based on a controlled vocabulary?

chicoreus commented 9 months ago

@tucotuco it seems likely that I'm misunderstanding the nature of the problem and what this test is intended to detect...

ArthurChapman commented 9 months ago

@chicoreus - you have changed to ISSUE-COORDINATEPRECISION_STANDARD - I was suggesting ISSUE_COORDINATEPRECISION_LIKELY

chicoreus commented 9 months ago

@ArthurChapman missed the likely.... Fixed.

tucotuco commented 9 months ago

A lookup table would be required, but implementation would likely need more than that because of the precision of the representation of the precision. For example, "0.000278" and "0.0002778" are both likely values, but is "0.00028"? Or "0.0003"? The perfectly precise values of the likely values can not even be expressed as numbers with finite precision.

ArthurChapman commented 9 months ago

Thanks @chicoreus - Standard implies a Conformance test whereas Likely implies a Likelihood Test

ArthurChapman commented 9 months ago

OK - for now if we leave it as a sourceAuthority and say that NO CURRENT API EXISTS - we can label it ?"Incomplete" and NEEDS WORK. Once we have a LookUp Table we can change to either Supplementary of CORE.

chicoreus commented 9 months ago

So we need to be concerned with the precision of the precision...

This feels like a test (or perhaps that is a related one) that needs to take the verbatim coordinate as an information element consulted, assess whether the verbatim coordinate is decimal degrees, degrees decimal minutes, or degrees minutes seconds, and if so test to see if the precision is a reasonable value for one of those. A very large number of cases can probably be covered with a small number of likely ranges that can be specified within the specification of the test and don't need an external vocabulary. If we exclude transformations from other coordinate systems from consideration, there are probably only a small number of cases to consider. If we include transformations from other coordinate systems we are likely into the place where we just have to ask if arbitrary values are in range. Assuming, of course, that I'm understanding the problem, something I'm not yet convinced of.

chicoreus commented 9 months ago

@ArthurChapman @tucotuco I'm just not seeing the need for a source authority. This very much feels like a likely set of values could be enumerated in the test, with an extension point for provided by a parameter that allows the addition of a list of other parameters (e.g. original data were likely to have included PLSS data (thus precise to section, half section, quarter section, quarter quarter section, etc)).

ArthurChapman commented 9 months ago

@chicoreus - I don't see why not a soucreAuthority. If you hard wire a set of values in the test itself, it is a big job to change later and would require a lengthy process (through the TDWG system), but if you use a sourceAuthority with a list of likely values, it is easy to add new ones as they arise and without having to go through a lengthy and difficult process. I think it is the simplest solution.

chicoreus commented 9 months ago

This is also a place we could explore the extension point in the response for representing uncertainty, particular values between 180 and 1 are possible {known to quadrant, known to 10 degrees, etc ), but less likely than values between 0.00001 and 1 inclusive, where particular values in that range have high likelyood, a value of 0.000001 is unlikely but not totally implausible, and values less than 0.000001 are implasuble. Thus issue could assert potential issue with a qualifier of uncertainty of the issue.

chicoreus commented 9 months ago

@ArthurChapman it isn't taxon names or values for sex, it is math. I'm not very comfortable with a controlled vocabulary for mathematical values. A parameter would readily allow other cases without a change to the test specifications.

ArthurChapman commented 9 months ago

@chicoreus We define bdq:sourceAuthority as "namespace that provides a reference for values required for a test evaluation" To me it makes no difference if that list of values is an alphabetical list of values or a numerical list - especially if that numeric list is a list of discreet values.

ArthurChapman commented 9 months ago

@chicoreus - bdq:sourceAuthority is not the same as a controlled Vocabulary, even though many of the sourceAuthority are controlled vocabularies.

ymgan commented 9 months ago

uhm ... maybe replace the individualCount examples with coordinatePrecision?

ArthurChapman commented 9 months ago

Thanks @ymgan - My mistake - done.

Tasilee commented 9 months ago

The lengthy discussion on this 'test' strongly suggests taking the simpler of the two strategies (range and likely values), and use ISSUE_COORDINATE_PRECISION_LIKELY with a range of 0.00001 to 1.0 being LIKELY and setting the status to Immature/Incomplete. We can Note a potential implementation of a Source Authority list of likely values.

ArthurChapman commented 9 months ago

If @tucotuco believes that a SourceAuthority is the best way to go with this test and that he believes that he could create one when needed - I'd go that way. It may be years before anyone decides to take it further. I'd suggest we label this test Immature/Incomplete at this stage.

tucotuco commented 9 months ago

A useful implementation would require a SourceAuthority of values combined with an algorithm to determine if a value is "close enough" to one of those values. Given the requirement of an algorithm, all of it could be done in code without a SourceAuthority. In any case, the range implementation would be of much less utility, only catching if the value is larger than or smaller than expected.

chicoreus commented 9 months ago

Fixing name and label to reflect this being an issue rather than a validation.

ArthurChapman commented 9 months ago

@chicoreus - We don't have any tests expressed in the negative - we had years of discussion on this. You've changed this to negative! To fit with ALL other tests - it should be called ISSUE_COORDINATEPRECISION_LIKELY. The Expected Response stays as is.

chicoreus commented 9 months ago

On Thu, 22 Feb 2024 12:47:47 -0800 Arthur Chapman @.***> wrote:

@chicoreus - We don't have any tests expressed in the negative - we had years of discussion on this. You've changed this to negative! To fit with ALL other tests - it should be called ISSUE_COORDINATEPRECISION_LIKELY. The Expected Response stays as is.

That's because we are almost always using Validations, which are positive assertions about data having quality. Issues are different, all Issues should be expressed in the negative as they are raising potential problems.

ArthurChapman commented 9 months ago

In that case, we will need to change #29, #72, #94

chicoreus commented 9 months ago

I should rephrase, all issues should be phrased to point out what the issue is, that is what is the thing that is a problem.

The following are correct:

293 ISSUE_COORDINATEPRECISION_UNLIKELY

ISSUE_COORDINATES_OUTSIDEEXPERTRANGE #292 (fixed from IN to OUTSIDE)

ISSUE_OUTLIER_DETECTED #291

ISSUE_COORDINATES_CENTEROFCOUNTRY #287

ISSUE_ESTABLISHMENTMEANS_NOTEMPTY #94 (correctly phrased, if there is a value in the term, then the data may lack quality for what organisms occur when uses, (but have value for studies of introduction of taxa)).

ISSUE_DATAGENERALIZATIONS_NOTEMPTY #72 (likewise, if there is a value in dataGeneralizations, then there may be a quality issue for use of the data (depending on the need for precision and the value of the dataGeneralizations))

ISSUE_ANNOTATION_NOTEMPTY #29 (likewise correctly phrases, if an annotation exists it might point to a data quality concern).

tdwg / bdq

TG2-ISSUE_COORDINATEPRECISION_UNLIKELY #293

293 ISSUE_COORDINATEPRECISION_UNLIKELY