tdwg / bdq

Biodiversity Data Quality (BDQ) Interest Group
https://github.com/tdwg/bdq
43 stars 7 forks source link

TG2-ISSUE_COORDINATES_OUTSIDEEXPERTRANGE #292

Closed ArthurChapman closed 9 months ago

ArthurChapman commented 9 months ago
TestField Value
GUID e766b0eb-73f3-4be0-bfcb-c15e15c16008
Label ISSUE_COORDINATES_OUTSIDEEXPERTRANGE
Description Are the geographic coordinates inside the geographic range as defined by 'expert/s' for the taxon?
TestType Issue
Darwin Core Class Occurrence
Information Elements ActedUpon dwc:scientificName
dwc:decimalLatitude
dwc:decimalLongitude
Information Elements Consulted
Expected Response EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if any of dwc:scientificName, dwc:decimalLatitude, dwc:decimalLongitude are bdq:Empty; POTENTIAL_ISSUE if the geographic coordinates for the species are outside a range given in the bdq:sourceAuthority; otherwise NOT_ISSUE.
Data Quality Dimension Conformance
Term-Actions COORDINATES_OUTSIDEEXPERTRANGE
Parameter(s) bdq:sourceAuthority
Source Authority bdq:sourceAuthority default = [TO BE DETERMINED]
Specification Last Updated 2024-02-12
Examples [dwc:scientificName="Eucalyptus globulus", dwc:decimalLatitude="-26.0", dwc:decimalLongitude="126.0" Response.status=RUN_HAS_RESULT, Response.result=POTENTIAL_ISSUE, Response.comment="The coordinates are outside the range of two 'expert polygons' as cited in the bdq:sourceAuthority"]
[dwc:scientificName="Eucalyptus globulus", dwc:decimalLatitude="-20.55", dwc:decimalLongitude="125.64": Response.status=RUN_HAS_RESULT, Response.result=NOT_ISSUE, Response.comment="The coordinates are within the range of two 'expert polygons' as cited in the bdq:sourceAuthority"]
Source ALA
References
Example Implementations (Mechanisms)
Link to Specification Source Code
Notes This bdq:Supplementary test is not regarded as CORE (cf. bdq:CORE) because of one or more of the reasons: not being widely applicable; not informative; not straightforward to implement or likely to return a high percentage of either bdq:COMPLIANT or bdq:NOT_COMPLIANT results (cf bdq:Response.result). A Supplementary test may be implemented as CORE when a suitable use case exists.
Tasilee commented 9 months ago

As I would see this implemented, this is a duplicate to #291.

ArthurChapman commented 9 months ago

@tasilee - disagree. Expert Range are human developed ranges and used extensively by IUCN etc. whereas #291 is derived using software such as Reverse Jackknife (climate or geographic). They are treated differently in all applications I have seen.

Tasilee commented 9 months ago

I would see #291 properly implemented as a check against an expert range, that would even adapt to a (necessary) date of occurrence. No jacknifing required. Admitted, this test is more explicit.

ArthurChapman commented 9 months ago

No - as people are implementing test like #291 - they are using algorithms to detect outliers - like Reverse Jackknifing or several others. It looks for detected outliers using such algorithms as Reverse Jackknifing. It has no Expert input. The way I set it up at ERIN and later at CRIA and that was implemented in software such as DivaGIS and R it give a degree of Outlierness <1 not an outlier, the higher the number >1 the worse the outlier - I'll look at putting some references into #291. With expert opinion, once you have a model (or otherwise), the experts may refine a model and restrict to Serpentenite Soil, Salts marshes, etc. Often expert maps (which could be loaded and stored in a bdq:sourceAuthority) are just a line drawn on a map by an "expert" that determines a boundary - very subjective, but the IUCN uses the expert distribution for determining its Red Lists. There are maps prepared fopr a number of species that people use. I think ALA, and some Australian States, use them for the restricted access data. #291 is purely an automated process - no expert opinion needed. They are two different tests.

Tasilee commented 9 months ago

I guess what I am trying to say is that an outlier could be detected by a range of different mechanisms, one of which is detecting that the occurrence is outside of an area defined by an expert on that species.

ArthurChapman commented 9 months ago

I guess you could combine - but both tests are in regular and common use as two different tests. I don't see why one would want to combine them. This one is an easy one to do - the outlier tests are much more difficult to write the Expected Response. Not impossible, but very difficult. The Expert Range generally only refers to a geographic range, whereas the outlier tests test against one or more environmental variables, so will often detect an outlier environmentally, that does not look like an outlier geographically. I would strongly support keeping them as two separate tests.

ArthurChapman commented 9 months ago

Changed label to Supplementary and added appropriate Note

chicoreus commented 9 months ago

This isn't really suitable for definition as a test. It depends on too many subjectves, and lacks a service to point to that would encapsulate those subjectives. Not supplementary, not do not implement, just doesn't belong here. Delete issue entirely.

ArthurChapman commented 9 months ago

A lot of people are using such a test locally - so I see value as Supplementary so that people can use it in a local implementation - with a suitable locally prepared sourceAuthority. For example, in Western Australia, for example, they have Expert maps for all their threatened species and may like to use this test associated with their own local sourceAuthority - thus Paramaterized. I don't see it every being made CORE as unlikely to be used Globally.

chicoreus commented 9 months ago

If we can point at at a couple of examples in the wild where there there is enough information for us to assert what the test should look like and produce a working implementation, then very worth including. As a hypothetical, there is significant work involved in making sure we have a sensible articulation of this test. That feels out of scope for the present effort, thus I propose deleting these issues rather than leaving them as very incomplete suggestions.

chicoreus commented 9 months ago

Some of the issues: what spatial buffer is appropriate (probably a parameter)? Must the point georeference lie within the the known range, or is an overlap of the uncertainty for the georeference with the range sufficient. How should cases where there is no range map for the taxon be handled? If an extent is given, must the entire extent fall within the known range, or is overlap sufficient? As this is an issue, would either the absence of a suitable range map or the absence of taxon or georeference data to compare with be good grounds for asserting NOT_ISSUE, or is EXTERNAL/INTERNAL_PREREQUISITES_NOT_MET more suitable? There is too much work to be done in refining this test to be useful for us to create a probably misleading stake in the ground for it.

ArthurChapman commented 9 months ago

@tasilee - do we have an example from ALA that we can link to in the references? Preferably with a map. You had added this reference (in the spreadsheet) with reference to Booth - is there a Booth reference?

Tasilee commented 9 months ago

@chicoreus : @ArthurChapman and I totally disagree with "DeletionProposed". This is a VERY powerful test that is not difficult to implement (as ALA and others have done).

chicoreus commented 9 months ago

@Tasilee agreed, but this goes as immature/incomplete, unless we are willing to put in the time to produce an implementation, produce validation data, and make decisions about parameters, handling overlap of uncertainty, and produce a mature description of the test. There are too many unspecified variables at this point for us to assert otherwise without substantive work.

Tasilee commented 9 months ago

I would have thought that by definition, Supplementary, DO NOT IMPLEMENT and Incomplete 'tests' do not require an "implementation" or "validation data". We need to fill out what we can of the parameters, designate the category, and maybe explain why we made that decision.

chicoreus commented 9 months ago

@Tasilee Incomplete tests definitely leave validation data to those who revisit the test and bring it to a mature state. Similarly, do not implement tests, no validation data as we are advocating that nobody attempt to implement the test as it is inherently problematic and misleading.

Supplementary tests we may wish to have validation data for, as we are making some claim about maturity. Certainly any that aren't trivial NOTEMPTY validations we probably should produce implementations and validate against validation data to support a claim that these are sufficiently mature to have utility in assessing biodiversity data quality in non CORE use cases...

ArthurChapman commented 9 months ago

@chicoreus - I am not sure that we have the time or energy to prepare validation data for all the Supplementary tests. We do have Examples that show the type of data that would need to be in the validation data file. I would see that part of preparing a Supplementary Test for Core somewhere down the line, would be some work that would include: checking implications of making the test CORE, editing and making sure everything works and makes sense, preparing validation data, checking and testing implementation against that validation data. If we try and do that at the moment - then it will dilute the time we spend on the CORE tests, and add to more burn out on all of us. This is part of the justification of making them Supplementary.

chicoreus commented 9 months ago

Correcting name/label of test to reflect that it is raising an issue where the coordinates lie outside the range.