Open ArthurChapman opened 9 months ago
TestField | Value |
---|---|
GUID | 256e51b3-1e08-4349-bb7e-5186631c3f8e |
Label | ISSUE_COORDINATES_CENTEROFCOUNTRY |
Description | Are the supplied geographic coordinates within a defined buffer of the center of the country? |
TestType | Issue |
Darwin Core Class | dcterms:Location |
Information Elements ActedUpon | dwc:countryCode |
dwc:decimalLatitude | |
dwc:decimalLongitude | |
Information Elements Consulted | dwc:coordinateUncertaintyInMeters |
Expected Response | EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if any of dwc:countryCode, dwc:decimalLatitude, dwc:decimalLongitude are bdq:Empty; POTENTIAL_ISSUE if (1) the geographic coordinates are within the distance given by bdq:spatialBufferInMeters from the center of the supplied dwc:countryCode as represented in the bdq:sourceAuthority (or one of the centers, if the bdq:sourceAuthority provides more than one per country code) and (2) the dwc:coordinateUncertaintyInMeters is bdq:Empty or less than half the square root of the area of the country; otherwise NOT_ISSUE. |
Data Quality Dimension | Conformance |
Term-Actions | COORDINATES_CENTEROFCOUNTRY |
Parameter(s) | bdq:spatialBufferInMeters |
bdq:sourceAuthority | |
Source Authority | bdq:spatialBufferInMeters default = "5000" |
bdq:sourceAuthority default = "GBIF Catalogue of Country Centroides" {[https://raw.githubusercontent.com/jhnwllr/catalogue-of-centroids/master/PCLI.tsv]} | |
Specification Last Updated | 2024-08-28 |
Examples | [dwc:decimalLatitude="-35.38804", dwc:decimalLongitude="-65.154964", dwc:countryCode="AR": Response.status=RUN_HAS_RESULT, Response.result=POTENTIAL_ISSUE, Response.comment="coordinates fall within buffered distance in the bdq:sourceAuthority for dwc:countryCode"] |
[dwc:decimalLatitude="-34.184199", dwc:decimalLongitude="-65.509403", dwc:countryCode="AR": Response.status=RUN_HAS_RESULT, Response.result=NOT_ISSUE, Response.comment="coordinates fall outside buffered distance in the bdq:sourceAuthority for dwc:countryCode"] | |
Source | GBIF |
References |
|
Example Implementations (Mechanisms) | |
Link to Specification Source Code | |
Notes | We have increased the buffer to 5000 meters to cater for differences that may have arisen due to the difference in geodetic datums |
@jhnwllr Could you check this TEST please? Is there an API that we can link to?
Is the spatial buffer dependent on the size of the country? Is the spatial buffer dependent on a combination of the size of the country and the resolution of the country shape spatial data?
The spatial buffer is set as a default - under Parameterized - people can put different value if they wish. 3000 meters thought to be a good value given work carried out by John Waller.
@jhnwllr replied separately as
I have now separated out PCL1 and ADM1 types into separate files.
"I use PCL1 as a politically neutral name for "countries".
So see this file I just generated for "countries". https://raw.githubusercontent.com/jhnwllr/catalogue-of-centroids/master/PCLI.tsv
There isn't yet an API endpoint which just lists the centroids GBIF is using, but you can use occurrence search to get a "list of the centroids with occurrences" so to speak. https://www.gbif.org/occurrence/search?advanced=1&occurrence_status=present&distance_from_centroid_in_meters=0,0 https://www.gbif.org/api/occurrence/search?advanced=1&occurrence_status=present&distance_from_centroid_in_meters=0,0"
Source Authority and Notes updated following advice from @jhnwllr above.
Is this now an Immature/Incomplete or something else? If the former, we need to start adding relevant Notes.
I think this is Supplementary - given that we do have a good SourceAuthority. ALthough there is not an API at the moment, the link that @jhnwllr is an alternative that should work.
Should be straightforward to implement without an API given https://raw.githubusercontent.com/jhnwllr/catalogue-of-centroids/master/PCLI.tsv, ask if the coordinate is near one of the points given for the country code in that file.
Propose changing from:
EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if any of dwc:country, dwc:decimalLatitude, dwc:decimalLongitude are EMPTY; POTENTIAL_ISSUE if the geographic coordinates are within the distance given by bdq:spatialBufferInMeters from the center of the supplied dwc:country as represented in the bdq:sourceAuthority; otherwise NOT_ISSUE.
to:
EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if any of dwc:country, dwc:decimalLatitude, dwc:decimalLongitude are EMPTY or if dwc:geodeticDatum is not EPSG:4326; POTENTIAL_ISSUE if the geographic coordinates are within the distance given by bdq:spatialBufferInMeters from the center (or one of the centers, if the bdq:sourceAuthority provides more than one per country code) of the supplied dwc:countryCode as represented in the bdq:sourceAuthority; otherwise NOT_ISSUE.
Remove country as an information element, just use dwc:countryCode as consulted.
Alternately:
EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if any of dwc:country, dwc:decimalLatitude, dwc:decimalLongitude are EMPTY POTENTIAL_ISSUE if the geographic coordinates are within the distance given by bdq:spatialBufferInMeters from the center (or one of the centers, if the bdq:sourceAuthority provides more than one per country code) of the supplied dwc:countryCode as represented in the bdq:sourceAuthority; otherwise NOT_ISSUE.
with a slightly larger spatial buffer to add in uncertainty from potential differences in the datum.
Expected Response modified to cater for the possibility of more than one centroid, Specification Last Updated added, and Notes modified. Test made CORE rather than Supplementary as don't need an API, as we can use the file prepared by @jhnwllr
Per @tucotuco and @ymgan need to incorporate coordinateUncertaintyInMeters, as a point at the centroid with a coordinate uncertanty equivalent to the size of the country is reasonable and doesn't need to be identified as a potential issue.
Expected response doesn't quite read right in the bits about multiple possible centers.
Also needs to allow points centered on the country with a coordinateUnertaintyInMeters approximating the country, perhaps change from:
EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if any of dwc:countryCode, dwc:decimalLatitude, dwc:decimalLongitude are EMPTY; POTENTIAL_ISSUE if the geographic coordinates are within the distance given by bdq:spatialBufferInMeters from the center (or one of the centers), of the bdq:sourceAuthority provides more than one per country code of the supplied dwc:countryCode as represented in the bdq:sourceAuthority; otherwise NOT_ISSUE.
To:
EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if any of dwc:countryCode, dwc:decimalLatitude, dwc:decimalLongitude are EMPTY; POTENTIAL_ISSUE if (1) the geographic coordinates are within the distance given by bdq:spatialBufferInMeters from the center of the supplied dwc:countryCode as represented in the bdq:sourceAuthority (or one of the centers, if the bdq:sourceAuthority provides more than one per country code) and (2) the dwc:coordinateUncertaintyInMeters is less than half the square root of the area of the country; otherwise NOT_ISSUE.
Adding coordinateUncertaintyInMeters as an information element consulted.
We could be more general about the coordinateUncertaintyInMeters being large, e.g. "large relative to the size of the country" and put the half the square root of the area in the notes. Square root of the area of the country is available in the default source authority, and wouldn't force us to add a spatial source authority for country boundaries (we could do that and phrase a coordinate uncertainty in meters that is less than the radius of a circle that the country fits into (which could be precalculated from country shape data), Square root of the area is a simple pragmatic way to estimate a large uncertainty relative to the size of the country that would make the behavior of the test consistent across implementations, and is provided in the default source authority.
I don't think that works @chicoreus. If dwc:coordinateUncertaintyInMeters is EMPTY then NOT_ISSUE as you have (1) and (2) for POTENTIAL_ISSUE?
@Tasilee good catch, needs explicit handling of empty for coordinateUncertaintyInMeters. How about:
EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if any of dwc:countryCode, dwc:decimalLatitude, dwc:decimalLongitude are EMPTY; POTENTIAL_ISSUE if (1) the geographic coordinates are within the distance given by bdq:spatialBufferInMeters from the center of the supplied dwc:countryCode as represented in the bdq:sourceAuthority (or one of the centers, if the bdq:sourceAuthority provides more than one per country code) and (2) the dwc:coordinateUncertaintyInMeters is EMPTY or less than half the square root of the area of the country; otherwise NOT_ISSUE.
I think that works @chicoreus - but then I have just flown half way around the world and may have brain fog! Just thinking of the cases where one has a country (e.g. Australia or Nova Hollandia - quite common) and a center of the country is given. In that case the half the square root of the area of the country - Square Root of the area of Australia is ~2,782 km - that is greater than the distance from the center to any part of mainland Australia - it works. Chile, I'm not so sure though being long and thin!
@ArthurChapman in the PCLI country centroid data set, Chile has a area of about 736593 km², this would give a radius of 429 km, and the conclusion that a coordinate uncertainty in meters of larger than 429000 would be large relative to the country. That isn't being precise and asserting what coordinate uncertainty in meters would produce a circle that entirely encloses the country (for Chile, much of the country would be outside that circle), but it does feel like a good pragmatic estimator of uncertainties that are relatively large in comparison to the country. Alternative is to include another source authority for country shapes, and obtain values of radius of a circle that would contain the entire country from there, but then there will be uncertainties in how people representing uncertainties containing a country did so, and using the half square root of the area seems like a reasonable conservative estimator for large uncertainty relative to country size, which is, in essence, what we are trying to exclude from being flagged as potentially problematic here.
It seems reasonable to improve the Expected Response from
EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if any of dwc:countryCode, dwc:decimalLatitude, dwc:decimalLongitude are EMPTY; POTENTIAL_ISSUE if the geographic coordinates are within the distance given by bdq:spatialBufferInMeters from the center (or one of the centers), of the bdq:sourceAuthority provides more than one per country code of the supplied dwc:countryCode as represented in the bdq:sourceAuthority; otherwise NOT_ISSUE.
to
EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if any of dwc:countryCode, dwc:decimalLatitude, dwc:decimalLongitude are EMPTY; POTENTIAL_ISSUE if (1) the geographic coordinates are within the distance given by bdq:spatialBufferInMeters from the center of the supplied dwc:countryCode as represented in the bdq:sourceAuthority (or one of the centers, if the bdq:sourceAuthority provides more than one per country code) and (2) the dwc:coordinateUncertaintyInMeters is EMPTY or less than half the square root of the area of the country; otherwise NOT_ISSUE.
NEEDS WORK??
I am happy with that. It will be interesting to see how it works in practice. Perhaps another, more complicated, way is to look at dwc:locality if it only contains a country name, but that would be difficult to work in practice. For example if the dwc:locality only said "Australia" or "Chile", but then you'd need to find all the synonyms "Nova Hollandia", etc. and country names at the time of the event and then use the centroid of those historical countries over time and that we don't have. It may be possible, but I think extremely difficult to do well.
I am happy to use the @tasilee suggestion and see what feedback one gets over time.
I've added dwc:coordinateUncertaintyInMeters as an information element consulted for the new specification. I think we can take the needs work off.