tdwg / bdq

Biodiversity Data Quality (BDQ) Interest Group
https://github.com/tdwg/bdq
43 stars 7 forks source link

TG2-ISSUE_COORDINATES_CENTEROFCOUNTRY #287

Open ArthurChapman opened 9 months ago

ArthurChapman commented 9 months ago
TestField Value
GUID 256e51b3-1e08-4349-bb7e-5186631c3f8e
Label ISSUE_COORDINATES_CENTEROFCOUNTRY
Description Are the supplied geographic coordinates within a defined buffer of the center of the country?
TestType Issue
Darwin Core Class dcterms:Location
Information Elements ActedUpon dwc:countryCode
dwc:decimalLatitude
dwc:decimalLongitude
Information Elements Consulted dwc:coordinateUncertaintyInMeters
Expected Response EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if any of dwc:countryCode, dwc:decimalLatitude, dwc:decimalLongitude are bdq:Empty; POTENTIAL_ISSUE if (1) the geographic coordinates are within the distance given by bdq:spatialBufferInMeters from the center of the supplied dwc:countryCode as represented in the bdq:sourceAuthority (or one of the centers, if the bdq:sourceAuthority provides more than one per country code) and (2) the dwc:coordinateUncertaintyInMeters is bdq:Empty or less than half the square root of the area of the country; otherwise NOT_ISSUE.
Data Quality Dimension Conformance
Term-Actions COORDINATES_CENTEROFCOUNTRY
Parameter(s) bdq:spatialBufferInMeters
bdq:sourceAuthority
Source Authority bdq:spatialBufferInMeters default = "5000"
bdq:sourceAuthority default = "GBIF Catalogue of Country Centroides" {[https://raw.githubusercontent.com/jhnwllr/catalogue-of-centroids/master/PCLI.tsv]}
Specification Last Updated 2024-08-28
Examples [dwc:decimalLatitude="-35.38804", dwc:decimalLongitude="-65.154964", dwc:countryCode="AR": Response.status=RUN_HAS_RESULT, Response.result=POTENTIAL_ISSUE, Response.comment="coordinates fall within buffered distance in the bdq:sourceAuthority for dwc:countryCode"]
[dwc:decimalLatitude="-34.184199", dwc:decimalLongitude="-65.509403", dwc:countryCode="AR": Response.status=RUN_HAS_RESULT, Response.result=NOT_ISSUE, Response.comment="coordinates fall outside buffered distance in the bdq:sourceAuthority for dwc:countryCode"]
Source GBIF
References
  • Waller JT (2023) Processing Country Centroids at the Global Biodiversity Information Facility. Biodiversity Information Science and Standards 7: e110728. https://doi.org/10.3897/biss.7.110728
Example Implementations (Mechanisms)
Link to Specification Source Code
Notes We have increased the buffer to 5000 meters to cater for differences that may have arisen due to the difference in geodetic datums
ArthurChapman commented 9 months ago

@jhnwllr Could you check this TEST please? Is there an API that we can link to?

chicoreus commented 9 months ago

Is the spatial buffer dependent on the size of the country? Is the spatial buffer dependent on a combination of the size of the country and the resolution of the country shape spatial data?

ArthurChapman commented 9 months ago

The spatial buffer is set as a default - under Parameterized - people can put different value if they wish. 3000 meters thought to be a good value given work carried out by John Waller.

@jhnwllr replied separately as

I have now separated out PCL1 and ADM1 types into separate files.

"I use PCL1 as a politically neutral name for "countries".

So see this file I just generated for "countries". https://raw.githubusercontent.com/jhnwllr/catalogue-of-centroids/master/PCLI.tsv

There isn't yet an API endpoint which just lists the centroids GBIF is using, but you can use occurrence search to get a "list of the centroids with occurrences" so to speak. https://www.gbif.org/occurrence/search?advanced=1&occurrence_status=present&distance_from_centroid_in_meters=0,0 https://www.gbif.org/api/occurrence/search?advanced=1&occurrence_status=present&distance_from_centroid_in_meters=0,0"

ArthurChapman commented 9 months ago

Source Authority and Notes updated following advice from @jhnwllr above.

Tasilee commented 9 months ago

Is this now an Immature/Incomplete or something else? If the former, we need to start adding relevant Notes.

ArthurChapman commented 9 months ago

I think this is Supplementary - given that we do have a good SourceAuthority. ALthough there is not an API at the moment, the link that @jhnwllr is an alternative that should work.

chicoreus commented 3 months ago

Should be straightforward to implement without an API given https://raw.githubusercontent.com/jhnwllr/catalogue-of-centroids/master/PCLI.tsv, ask if the coordinate is near one of the points given for the country code in that file.

Propose changing from:

EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if any of dwc:country, dwc:decimalLatitude, dwc:decimalLongitude are EMPTY; POTENTIAL_ISSUE if the geographic coordinates are within the distance given by bdq:spatialBufferInMeters from the center of the supplied dwc:country as represented in the bdq:sourceAuthority; otherwise NOT_ISSUE.

to:

EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if any of dwc:country, dwc:decimalLatitude, dwc:decimalLongitude are EMPTY or if dwc:geodeticDatum is not EPSG:4326; POTENTIAL_ISSUE if the geographic coordinates are within the distance given by bdq:spatialBufferInMeters from the center (or one of the centers, if the bdq:sourceAuthority provides more than one per country code) of the supplied dwc:countryCode as represented in the bdq:sourceAuthority; otherwise NOT_ISSUE.

Remove country as an information element, just use dwc:countryCode as consulted.

Alternately:

EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if any of dwc:country, dwc:decimalLatitude, dwc:decimalLongitude are EMPTY POTENTIAL_ISSUE if the geographic coordinates are within the distance given by bdq:spatialBufferInMeters from the center (or one of the centers, if the bdq:sourceAuthority provides more than one per country code) of the supplied dwc:countryCode as represented in the bdq:sourceAuthority; otherwise NOT_ISSUE.

with a slightly larger spatial buffer to add in uncertainty from potential differences in the datum.

ArthurChapman commented 3 months ago

Expected Response modified to cater for the possibility of more than one centroid, Specification Last Updated added, and Notes modified. Test made CORE rather than Supplementary as don't need an API, as we can use the file prepared by @jhnwllr

chicoreus commented 3 months ago

Per @tucotuco and @ymgan need to incorporate coordinateUncertaintyInMeters, as a point at the centroid with a coordinate uncertanty equivalent to the size of the country is reasonable and doesn't need to be identified as a potential issue.

chicoreus commented 3 months ago

Expected response doesn't quite read right in the bits about multiple possible centers.

Also needs to allow points centered on the country with a coordinateUnertaintyInMeters approximating the country, perhaps change from:

EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if any of dwc:countryCode, dwc:decimalLatitude, dwc:decimalLongitude are EMPTY; POTENTIAL_ISSUE if the geographic coordinates are within the distance given by bdq:spatialBufferInMeters from the center (or one of the centers), of the bdq:sourceAuthority provides more than one per country code of the supplied dwc:countryCode as represented in the bdq:sourceAuthority; otherwise NOT_ISSUE.

To:

EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if any of dwc:countryCode, dwc:decimalLatitude, dwc:decimalLongitude are EMPTY; POTENTIAL_ISSUE if (1) the geographic coordinates are within the distance given by bdq:spatialBufferInMeters from the center of the supplied dwc:countryCode as represented in the bdq:sourceAuthority (or one of the centers, if the bdq:sourceAuthority provides more than one per country code) and (2) the dwc:coordinateUncertaintyInMeters is less than half the square root of the area of the country; otherwise NOT_ISSUE.

Adding coordinateUncertaintyInMeters as an information element consulted.

We could be more general about the coordinateUncertaintyInMeters being large, e.g. "large relative to the size of the country" and put the half the square root of the area in the notes. Square root of the area of the country is available in the default source authority, and wouldn't force us to add a spatial source authority for country boundaries (we could do that and phrase a coordinate uncertainty in meters that is less than the radius of a circle that the country fits into (which could be precalculated from country shape data), Square root of the area is a simple pragmatic way to estimate a large uncertainty relative to the size of the country that would make the behavior of the test consistent across implementations, and is provided in the default source authority.

Tasilee commented 3 months ago

I don't think that works @chicoreus. If dwc:coordinateUncertaintyInMeters is EMPTY then NOT_ISSUE as you have (1) and (2) for POTENTIAL_ISSUE?

chicoreus commented 3 months ago

@Tasilee good catch, needs explicit handling of empty for coordinateUncertaintyInMeters. How about:

EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if any of dwc:countryCode, dwc:decimalLatitude, dwc:decimalLongitude are EMPTY; POTENTIAL_ISSUE if (1) the geographic coordinates are within the distance given by bdq:spatialBufferInMeters from the center of the supplied dwc:countryCode as represented in the bdq:sourceAuthority (or one of the centers, if the bdq:sourceAuthority provides more than one per country code) and (2) the dwc:coordinateUncertaintyInMeters is EMPTY or less than half the square root of the area of the country; otherwise NOT_ISSUE.

ArthurChapman commented 3 months ago

I think that works @chicoreus - but then I have just flown half way around the world and may have brain fog! Just thinking of the cases where one has a country (e.g. Australia or Nova Hollandia - quite common) and a center of the country is given. In that case the half the square root of the area of the country - Square Root of the area of Australia is ~2,782 km - that is greater than the distance from the center to any part of mainland Australia - it works. Chile, I'm not so sure though being long and thin!

chicoreus commented 2 months ago

@ArthurChapman in the PCLI country centroid data set, Chile has a area of about 736593 km², this would give a radius of 429 km, and the conclusion that a coordinate uncertainty in meters of larger than 429000 would be large relative to the country. That isn't being precise and asserting what coordinate uncertainty in meters would produce a circle that entirely encloses the country (for Chile, much of the country would be outside that circle), but it does feel like a good pragmatic estimator of uncertainties that are relatively large in comparison to the country. Alternative is to include another source authority for country shapes, and obtain values of radius of a circle that would contain the entire country from there, but then there will be uncertainties in how people representing uncertainties containing a country did so, and using the half square root of the area seems like a reasonable conservative estimator for large uncertainty relative to country size, which is, in essence, what we are trying to exclude from being flagged as potentially problematic here.

Tasilee commented 2 months ago

It seems reasonable to improve the Expected Response from

EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if any of dwc:countryCode, dwc:decimalLatitude, dwc:decimalLongitude are EMPTY; POTENTIAL_ISSUE if the geographic coordinates are within the distance given by bdq:spatialBufferInMeters from the center (or one of the centers), of the bdq:sourceAuthority provides more than one per country code of the supplied dwc:countryCode as represented in the bdq:sourceAuthority; otherwise NOT_ISSUE.

to

EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if any of dwc:countryCode, dwc:decimalLatitude, dwc:decimalLongitude are EMPTY; POTENTIAL_ISSUE if (1) the geographic coordinates are within the distance given by bdq:spatialBufferInMeters from the center of the supplied dwc:countryCode as represented in the bdq:sourceAuthority (or one of the centers, if the bdq:sourceAuthority provides more than one per country code) and (2) the dwc:coordinateUncertaintyInMeters is EMPTY or less than half the square root of the area of the country; otherwise NOT_ISSUE.

NEEDS WORK??

ArthurChapman commented 2 months ago

I am happy with that. It will be interesting to see how it works in practice. Perhaps another, more complicated, way is to look at dwc:locality if it only contains a country name, but that would be difficult to work in practice. For example if the dwc:locality only said "Australia" or "Chile", but then you'd need to find all the synonyms "Nova Hollandia", etc. and country names at the time of the event and then use the centroid of those historical countries over time and that we don't have. It may be possible, but I think extremely difficult to do well.

I am happy to use the @tasilee suggestion and see what feedback one gets over time.

chicoreus commented 2 months ago

I've added dwc:coordinateUncertaintyInMeters as an information element consulted for the new specification. I think we can take the needs work off.