tdwg / bdq

Biodiversity Data Quality (BDQ) Interest Group
https://github.com/tdwg/bdq
43 stars 7 forks source link

TG2-AMENDMENT_COUNTRYCODE_FROM_COORDINATES #73

Open iDigBioBot opened 6 years ago

iDigBioBot commented 6 years ago
TestField Value
GUID 8c5fe9c9-4ba9-49ef-b15a-9ccd0424e6ae
Label AMENDMENT_COUNTRYCODE_FROM_COORDINATES
Description Proposes an amendment to the value of dwc:countryCode if dwc:decimalLatitude and dwc:decimalLongitude fall within a boundary from the bdq:countryShapes that is attributable to a single valid country code.
TestType Amendment
Darwin Core Class dcterms:Location
Information Elements ActedUpon dwc:countryCode
dwc:decimalLatitude
dwc:decimalLongitude
Information Elements Consulted
Expected Response EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if either dwc:decimalLatitude or dwc:decimalLongitude is bdq:Empty, or if dwc:countryCode is bdq:NotEmpty; FILLED_IN dwc:countryCode if dwc:decimalLatitude and dwc:decimalLongitude fall within a boundary from the bdq:countryShapes that is attributable to a single valid country code; otherwise NOT_AMENDED.
Data Quality Dimension Completeness
Term-Actions COUNTRYCODE_FROM_COORDINATES
Parameter(s) bdq:sourceAuthority
Source Authority bdq:sourceAuthority default = "10m-admin-1 boundaries UNION with Exclusive Economic Zones" {[https://www.naturalearthdata.com/downloads/10m-cultural-vectors/10m-admin-1-states-provinces/] spatial UNION [https://www.marineregions.org/downloads.php#marbound]}
Specification Last Updated 2024-08-18
Examples [dwc:decimalLatitude="-25.23", dwc:decimalLongitude="135.43", dwc:countryCode="": Response.status=FILLED_IN, Response.result=dwc:countryCode="AU", Response.comment="dwc:decimalLatitude and dwc:decimalLongitude contain interpretable values"]
[dwc:decimalLatitude="-38.280937", dwc:decimalLongitude="72.047790", dwc:countryCode="": Response.status=NOT_AMENDED, Response.result="", Response.comment="Coordinates do not fall in the boundary of any country"]
Source ALA, GBIF, iDigBio
References
Example Implementations (Mechanisms)
Link to Specification Source Code
Notes This amendment simply fills dwc:countryCode from a lookup of dwc:decimalLatitude and dwc:decimalLongitude. dwc:coordinateUncertaintyInMeters and dwc:coordinatePrecicision (if present) imply a buffer around the provided coordinates. Likewise, country polygons cannot be 100% accurate at all scales (Dooley 2005), so a spatial buffer of the country boundaries is also justified. Taking spatial buffers into account does however greatly complicate the logic and the implementation of this and related tests. In this test, a detection of multiple country codes by sampling within the buffer while possible, is not considered.
iDigBioBot commented 6 years ago

Comment by Paula Zermoglio (@pzermoglio) migrated from spreadsheet: Only useful if performed AFTER decimalLat and decimalLong interpretation.

iDigBioBot commented 6 years ago

Comment by Paul Morris (@chicoreus) migrated from spreadsheet: @PZ amendment sequence is indeed important. Tianhong Song has a paper on this in the context of prerequisites for workflow actors.

iDigBioBot commented 6 years ago

Comment by Paul Morris (@chicoreus) migrated from spreadsheet: Implementation requires guidance on how to handle marine material inside a country's exclusive economic zone.

iDigBioBot commented 6 years ago

Comment by Paul Morris (@chicoreus) migrated from spreadsheet: Should include coordinateUncertainty in meters. Also see Lee's note in Principles about some geographic tests needing buffers

ArthurChapman commented 6 years ago

Change Country to Country Codes - name and elsewhere

tucotuco commented 6 years ago

Agreed at TDWG 2018 DQIG meeting that the name TG2-AMENDMENT_COUNTRYCODE_FROM_COORDINATES is satisfactory.

tucotuco commented 4 years ago

Similar to the problem raised in Issue #185, this tests mentions a source authority that can not deliver the AMENDMENT. It is also silent on the authority for the geometries for the country codes. To me, the bdq:sourceAuthority should be the GBIF reverse geocoding API (https://github.com/gbif/geocode), coming in 2020. It will be based on Natural Earth, GADM, Open Street Maps, EEZones and more. The documentation says it will be for internal GBIF use, but @timrobertson100 says that he expects the API to be exposed.

timrobertson100 commented 4 years ago

The geocode service is available today, and provides a lookup based on the given coordinate.

For example latitude 51.0 and longitude 1.0 yields this response:

[
  {
    "id": "80",
    "type": "Political",
    "source": "http://www.naturalearthdata.com",
    "title": "United Kingdom",
    "isoCountryCode2Digit": "GB"
},
{
    "id": "212",
    "type": "EEZ",
    "source": "http://vliz.be/vmdcdata/marbound/",
    "title": "United Kingdom",
    "isoCountryCode2Digit": "GB"
    }
]

Because boundaries don't align (resolution of the polygons), we buffer the search which is why multiple results can be returned.

To reduce WS traffic, we also encode the database into an image with dictionary encoded colors (i.e. by seeing a non-black or white colour, you can refer to the dictionary to know the country).

Today the service only has EEZ and NaturalEarth files, but can (will) be extended.

When a record has a stated country and coordinates we verify that seems reasonable, and if not flips coordinates around and negates them "hunting" for a match. This is because in many cases the negative sign is omitted, or coordinates swapped. All of this happens after a reprojection to WGS84 if necessary.

tucotuco commented 4 years ago

@timrobertson100 This is ideal. May we cite it as our bdq:sourceAuthority default?

chicoreus commented 4 years ago

We are conflating authority with service with thesaurus in bdq:sourceAuthority. The authority for country codes is the ISO two letter country code list. The GBIF service is a service that wraps natural earth plus (source?) EEZ layers (and will change overtime as layers are added (with versioning?), the thesaurus is the natural earth and EEZ layers. For implementation, I'd much rather use a local GIS data store containing natural earth data and some appropriate EEZ layer than consulting a remote service - I want to use the same thesaurus, but not the service.

tucotuco commented 4 years ago

OK, what is the consensus then on what to cite and how? I think it is more useful to point people to the thesaurus. If that can be done in a general way instead of to a particular service built around it, so much the better. For example, is https://github.com/gbif/geocode the right thing to cite for the source authority for this test? What does not seem useful is to just give the controlled vocabulary that the amendment should comply with as the source authority, and to not have the thesaurus in the References. The controlled vocabulary is incomplete as an authority for amendments of this type.

On Tue, Apr 14, 2020 at 11:51 AM Paul J. Morris notifications@github.com wrote:

We are conflating authority with service with thesaurus in bdq:sourceAuthority. The authority for country codes is the ISO two letter country code list. The GBIF service is a service that wraps natural earth plus (source?) EEZ layers (and will change overtime as layers are added (with versioning?), the thesaurus is the natural earth and EEZ layers. For implementation, I'd much rather use a local GIS data store containing natural earth data and some appropriate EEZ layer than consulting a remote service - I want to use the same thesaurus, but not the service.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/tdwg/bdq/issues/73#issuecomment-613488984, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADQ72YD26BR3XCMM6AMOULRMRZ63ANCNFSM4EKSNA7Q .

Tasilee commented 4 years ago

Thanks @timrobertson100 , @tucotuco and @chicoreus. I am the least able person in the group to provide wisdom but figure I would put my thoughts down, anyway.

As we have discussed, GBIF is likely for the near future, to be a service location that aggregates thesauri and what we currently call 'bdq:sourceAuthority' (authorities).

Are the thesauri themselves 'source authorities'? If we are dependent on them as the reference, then in our context, I'd say yes. I agree with @chicoreus that the services associated with any source authority are a separate issue. That's why we use " if the bdq:sourceAuthority service ..."

To address @tucotuco 's question about how we reference, we either reference to the GBIF namespace end point (?) which references the relevant external source authority (e.g., ISO) in a standard way or we reference both the GBIF and the 'external' source authority. The former would be nice. Then there is a separate reference to the service associated with the thesaurus.

In the Expected response, we have agreed to use bdq:sourceAuthority [and service.] where there are implementation-dependent options or directly use the name of the source authority where there is no choice.

What we place in References and Notes, I defer to more appropriate authorities.

ArthurChapman commented 4 years ago

My view is same as @tuco and his question to @timrobertson100 "@timrobertson100 This is ideal. May we cite it as our bdq:sourceAuthority default"

If can just add the GBIF Geocode Authority as our bdq:sourceAuthority as we have done elsewhere, then I think this is our best option.

chicoreus commented 4 years ago

The more I think about this, the less I like the idea of specifying a query endpoint as a sourceAuthority. This leaves the resolution of the shape files, buffering near borders, and changes over time as opaque to the consumer, and also forces implementations to use a service for a test that should not be implemented with remote service calls but with a local spatial data store.

We should specify three parameters: (1) A shape file for country boundaries. (2) A shape file for exclusive economic zones. (3) An explicit buffer for points that fall near country boundaries that takes into account the resolution of the shapefiles. In addition, we should explicilty state in the specification how points that fall into the buffer are to be handled (preferably by not asserting an amendment, probably with INTERNAL_PREREQUISITES_NOT_MET, point falls too near boundary of shape to determine placement). In addition, we need to consider coordinatePrecision, and how that as an uncertainty on the coordinate intersects with countries or buffers.

We need to be much, much more explicit in how edge cases are to be handled in any test that involves GIS data.

This amendment also needs to cover the case of FILLED_IN, AMENDED would only cover an existing case of countryCode being altered based on the coordinates. We should consider if this test should ever assert AMENDED, or should restrict itself to FILLED_IN. I would generally be much more comfortable with asserting only FILLED_IN, as AMENDED could fix the wrong value (the coordinates, or their precision, or their error radius could be in error, but the country and countryCode be correct). This is particularly true near boundaries, where the error may lie in the resolution of the shape rather than either the coordinate or the countryCode.

chicoreus commented 4 years ago

I would suggest:

EXTERNAL_PREREQUISITES_NOT_MET if an external source authority service or local spatial data store was not available; INTERNAL_PREREQUISITES_NOT_MET if the fields dwc:decimalLatitude, dwc:decimalLongitude are EMPTY or the dwc:decimalLatitude and dwc:decimalLongitude cannot be converted to the SRS used for queries to the spatial service or data store, or if the area represented by the dwc:decimalLatitude, dwc:decimalLongitude, and dwc:coordinatePrecision overlaps with a 3km buffer zone on any country or EEZ shape in the service or spatial data store; FILLED_IN if the value of dwc:countryCode was EMPTY and was unambiguously inferred from supplied dwc:decimalLatitude, dwc:decimalLongitude and dwc:coordinatePrecision falling within a single boundary defined by the combination of terrestrial and exclusive economic zone and not overlapping a 3km buffer around such boundary; otherwise NOT_CHANGED

tucotuco commented 4 years ago

You would need dwc:coordinateUncertaintyInMeters in place of dwc:coordinatePrecision.

On Wed, Apr 22, 2020 at 8:16 PM Paul J. Morris notifications@github.com wrote:

I would suggest:

EXTERNAL_PREREQUISITES_NOT_MET if an external source authority service or local spatial data store was not available; INTERNAL_PREREQUISITES_NOT_MET if the fields dwc:decimalLatitude, dwc:decimalLongitude are EMPTY or the dwc:decimalLatitude and dwc:decimalLongitude cannot be converted to the SRS used for queries to the spatial service or data store, or if the area represented by the dwc:decimalLatitude, dwc:decimalLongitude, and dwc:coordinatePrecision overlaps with a 3km buffer zone on any country or EEZ shape in the service or spatial data store; FILLED_IN if the value of dwc:countryCode was EMPTY and was unambiguously inferred from supplied dwc:decimalLatitude, dwc:decimalLongitude and dwc:coordinatePrecision falling within a single boundary defined by the combination of terrestrial and exclusive economic zone and not overlapping a 3km buffer around such boundary; otherwise NOT_CHANGED

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tdwg/bdq/issues/73#issuecomment-618087366, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADQ724TISDUQIWHM4IVJS3RN53FRANCNFSM4EKSNA7Q .

Tasilee commented 4 years ago

Thanks @chicoreus . Seems reasonable to me, but I'd defer to Arthur and John on this.

You have raised the use of a 'local store' previously, but wouldn't the concept be an implementation option for many external source authorities? If so, then we could include the option in an overall strategy? Is it something we would encourage?

Tasilee commented 4 years ago

From @tucotuco : (so we capture the issues here)-

"EXTERNAL_PREREQUISITES_NOT_MET if the data from the source authority was not available; INTERNAL_PREREQUISITES_NOT_MET if the fields dwc:decimalLatitude, dwc:decimalLongitude are EMPTY or the dwc:decimalLatitude and dwc:decimalLongitude cannot be converted to the SRS used for source authority, or if the area represented by the dwc:decimalLatitude, dwc:decimalLongitude, and dwc:coordinateUncertaintyInMeters overlaps with a 3km buffer zone on any country or EEZ shape in the authority; FILLED_IN if the value of dwc:countryCode was EMPTY and was unambiguously inferred from supplied dwc:decimalLatitude, dwc:decimalLongitude and dwc:coordinatePrecision falling within a single boundary defined by the combination of terrestrial and exclusive economic zone and not overlapping a 3km buffer around such boundary; otherwise NOT_CHANGED

This is a simplification that leaves of mentioning services and local data stores and leaves the availability of the source authority data completely up to implementation, which is a decoupling I would like to see. Note that I also made the substitution of dwc:coordinateUncertaintyInMeters in place of dwc:coordinatePrecision."

Tasilee commented 4 years ago

Thanks @tucotuco . I agree with the decoupling and request comment from the team before considering generalizing to the other tests.

The Expected response has been amended to your wording.

Why 3km buffer (and not some other value)?

tucotuco commented 4 years ago

I don't no why a 3km buffer. I would not be able to justify such a choice. Realistic buffers are extremely complicated and some are country dependent.

On 19:20, Wed, Apr 29, 2020 Lee Belbin <notifications@github.com wrote:

Thanks @tucotuco https://github.com/tucotuco . I agree with the decoupling and request comment from the team before considering generalizing to the other tests.

The Expected response has been amended to your wording.

Why 3km buffer (and not some other value)?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tdwg/bdq/issues/73#issuecomment-621495941, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADQ724YOHEPDJAIUWC5ENDRPCR3HANCNFSM4EKSNA7Q .

chicoreus commented 4 years ago

From experience, 3km is a reasonable (but not strongly supportable) value to use given the point-resolution of the natural earth country data set. Buffer size isn't so much country dependent as resolution of data set dependent. It accounts for a point near a boundary actually being within one country, but the shape of the boundary at the resolution of the data set not representing the boundary accurately enough. General problem is the fractal how long is the coast of Britain problem - at higher and higher resolutions (more points, and a larger file) in the shape file of countries, points closer and closer to the boundary will consistently place on the actual correct side of the boundary on the ground. Working out a supportable buffer distance for a given data set would be a good thing. 3km is there just as a stake in the ground for natural earth countries.

Related question - what happens when some (or even most) of the uncertainty falls outside the country bounds, but the point is within, or vice versa, when the point is outside but most of the uncertainty is within?.....

tucotuco commented 4 years ago

@chicoreus I see what you mean. That would only be a concern for the land-locked boundaries, not the marine ones. I was thinking of marine shores where various levels of jurisdiction come into play as varying distances offshore or over continental shelves. So the justification for a number would be that anything else is too complex. I can live with that, but maube we should explain it in notes.

On Thu, Apr 30, 2020 at 2:18 PM Paul J. Morris notifications@github.com wrote:

From experience, 3km is a reasonable (but not strongly supportable) value to use given the point-resolution of the natural earth country data set. Buffer size isn't so much country dependent as resolution of data set dependent. It accounts for a point near a boundary actually being within one country, but the shape of the boundary at the resolution of the data set not representing the boundary accurately enough. General problem is the fractal how long is the coast of Britain problem - at higher and higher resolutions (more points, and a larger file) in the shape file of countries, points closer and closer to the boundary will consistently place on the actual correct side of the boundary on the ground. Working out a supportable buffer distance for a given data set would be a good thing. 3km is there just as a stake in the ground for natural earth countries.

Related question - what happens when some (or even most) of the uncertainty falls outside the country bounds, but the point is within, or vice versa, when the point is outside but most of the uncertainty is within?.....

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tdwg/bdq/issues/73#issuecomment-621989162, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADQ72YHX5O2DZGKXGJKZPTRPGXE7ANCNFSM4EKSNA7Q .

ArthurChapman commented 4 years ago

Agree with @chicoreus and @tucotuco. From memory (a long time since I did work on this), that a 3km buffer is an arbritrary number. Scale of coastlines and country is a huge issue and variation can be large. With coastlines, for example "does the map use the highwater mark or mean sea level, and have neap, spring, or king tides been taken into account? When it crosses the mouth of a river, does it take a direct line across, or does it follow the river upstream for a distance?" (Chapman et al. 2005). Some of the more commonly used boundary layers for political boundaries on a global scale are 1:3M or 1:1M (see http://www.fao.org/3/a0118e/a0118e05.htm). Also - that FAO paper discusses problems with country boundaries off shore where there are disputes. In our Georeferencing best Practices, we cite the Horizontal accuracy of a 1:1M map as between 500 and 850m - that would make a 1:3M around 1500-2500 m - so based on that, I guess 3km could be reasonable - depending on the scale of the layer being used

Tasilee commented 4 years ago

Thanks all for the comments. We should capture the key issues in the Notes (Arthur?), but the concept of a spatial buffer will also apply to #50, #51 and #56. Is this another case where we need to document some principles to be associated with the tests?

ArthurChapman commented 4 years ago

@Tasilee #56 - yes. #50 has no buffering - lookup of Country versus Country Code, #51 currently is a lookup to the WORMS database - so if WORMS regards the species as Marine, etc. then we use that (that is how OBIS does it) no buffer.

ArthurChapman commented 4 years ago

Suggest wording for Notes I have added here and in #56 and added reference for Dooley (2005).; "The level of buffering may be related to the scale of the underlying GIS layer being used. At a global scale, typical map scales used for borders and coastal areas are either 1:3M or 1:1M (Dooley 2005, Chapter 4). Horizontal accuracy at those scales is around 1.5-2.5km and 0.5-0.85 km respectively (Chapman & Wieczorek 2020)."

Tasilee commented 4 years ago

@ArthurChapman : I totally disagree about #50 and #51. Most unlike me. Both depend on dwc:decimalLatitude and dwc:decimalLongitude.

Your Notes text look good to me.

ArthurChapman commented 4 years ago

I have tidied up the wording in the Notes a little, and added GADM and Marineregions.org into the References.

Tasilee commented 4 years ago

Existing Expected response:

EXTERNAL_PREREQUISITES_NOT_MET if the data from any of the sources used as parameters bdq:sourceAuthority, was not available; INTERNAL_PREREQUISITES_NOT_MET if the fields dwc:decimalLatitude, dwc:decimalLongitude are EMPTY or the dwc:decimalLatitude and dwc:decimalLongitude cannot be converted to the SRS used for bdq:sourceAuthority, or if the area represented by the dwc:decimalLatitude, dwc:decimalLongitude, and dwc:coordinateUncertaintyInMeters overlaps with a buffer zone, defined by the parameter bdq:spatialBufferMeters, on any country or EEZ shape in bdq:sourceAuthority; FILLED_IN if the value of dwc:countryCode was EMPTY and was unambiguously inferred from supplied dwc:decimalLatitude, dwc:decimalLongitude and dwc:coordinatePrecision falling within a single boundary defined by the combination of terrestrial and exclusive economic zone and not overlapping a buffer around such boundary, defined by the parameter bdq:spatialBufferMeters; otherwise NOT_CHANGED

This is what I would prefer:

EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority service was not available; INTERNAL_PREREQUISITES_NOT_MET if dwc:decimalLatitude, dwc:decimalLongitude are EMPTY or if the area represented by the dwc:decimalLatitude, dwc:decimalLongitude, and dwc:coordinateUncertaintyInMeters does not intersect country or EEZ areas in bdq:sourceAuthority with bdq:spatialBufferMeters; FILLED_IN if the value of dwc:countryCode was EMPTY and was unambiguously inferred from the values of dwc:decimalLatitude, dwc:decimalLongitude and dwc:coordinateUncertaintyInMeters was within a country or EEZ defined by bdq:sourceAuthority with bdq:spatialBufferMeters; otherwise NOT_CHANGED.

I may be wrong but the logic of the existing ER seemed wrong for internal prerequisites. And the "unambiguously" obviates need for "single country".

Tasilee commented 4 years ago

Notes: I suggest (but again unsure of the links we should use - to reference or service or data?)

[bdq:sourceAuthority default = {country shapes = https://gadm.org; eezShapes = Marine Regions (marineregions.org}]; [bdq:spatialBufferMeters default = 3000 meters].

dwc:coordinateUncertaintyInMeters and dwc:coordinatePrecision if present, may obfuscate an unambiguous result. dwc:coordinatePrecision may be empty, but may be inferred from dwc:decimalLatitude and dwc:decimalLongitude by using the techniques published in the Georeferencing Best Practices (Chapman & Wieczorek 2020).

The level of buffering is related to the scale of the spatial data available. At a global scale, typical map scales used for borders and coastal areas are either 1:3M or 1:1M (Dooley 2005, Chapter 4). Horizontal accuracy at those scales is around 1,500-2,500 m or 500-850 m respectively (Chapman & Wieczorek 2020). We have recommended a conservative 3,000 m buffer.

tucotuco commented 4 years ago

@Tasilee said,

This is what I would prefer:

EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority service was not available; INTERNAL_PREREQUISITES_NOT_MET if dwc:decimalLatitude, dwc:decimalLongitude are EMPTY or if the area represented by the dwc:decimalLatitude, dwc:decimalLongitude, and dwc:coordinateUncertaintyInMeters does not intersect country or EEZ areas in bdq:sourceAuthority with bdq:spatialBufferMeters; FILLED_IN if the value of dwc:countryCode was EMPTY and was unambiguously inferred from the values of dwc:decimalLatitude, dwc:decimalLongitude and dwc:coordinateUncertaintyInMeters was within a country or EEZ defined by bdq:sourceAuthority with bdq:spatialBufferMeters; otherwise NOT_CHANGED.

Something is not quite right in there with, "...inferred from the values of dwc:decimalLatitude, dwc:decimalLongitude and dwc:coordinateUncertaintyInMeters was within a country..."

Unfortunately, I think it is a lot more complicated than we treating it. For example, the case when a dwc:geodeticDatum is not given or can not be unambigusoulsy interpreted.

Taking a stab at it...

EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority service was not available; INTERNAL_PREREQUISITES_NOT_MET if a) dwc:decimalLatitude or dwc:decimalLongitude is EMPTY, or b) the dwc_decimalLatitude and dwc:decimalLongitude can not be unambiguously transformed to the coordinate reference system of the bdq:sourceAuthority and the location given by dwc_decimalLatitude and dwc:decimalLongitude in the coordinate reference system of the bdq:sourceAuthority lies within the boundaries of a country code or EEZ feature, but within a distance to the nearest border less than the maximum possible datum shift between any geodetic coordinate reference system and the coordinate reference system of the bdq:sourceAuthority, or c) the location given by dwc:decimalLatitude, dwc:decimalLongitude, and dwc:geodeticDatum lies further outside the boundaries of any country code or EEZ feature than the dwc:coordinateUncertaintyInMeters plus the bdq:spatialBufferInMeters, or d) the location given by dwc:decimalLatitude, dwc:decimalLongitude, and dwc:geodeticDatum lies outside the boundaries of any country code or EEZ feature, but equally close to more than one country code feature or to more than one EEZ feature; FILLED_IN if the value of dwc:countryCode was EMPTY and was unambiguously inferred from the values of dwc:decimalLatitude, dwc:decimalLongitude, dwc:geodeticDatum, dwc:coordinateUncertaintyInMeters, and bdq_spatialBufferInMeters against a country code or EEZ feature defined by bdq:sourceAuthority; otherwise NOT_CHANGED.

Like I said, it is complicated. Even more so when you have to try to make the description rigorously correct and without Figures. Note, I put bdq:spatialBufferInMeters rather than bdq:spatialBufferMeters, to be consistent with Darwin Core naming conventions.

Tasilee commented 4 years ago

I have a headache :)

tucotuco commented 4 years ago

Looking at the notes, I really don't think dwc:coordinatePrecision has anything to do with this test, rather, dwc:coordinateUncertaintyInMeters and dwc:geodeticDatum do, in ways I tried to express in the previous comment.

tucotuco commented 4 years ago

I have a headache :)

That wasn't my intention, but it does not come unexpected. ;-)

ArthurChapman commented 4 years ago

Good job (I think) @tucotuco. I agree with bdq:spatialBufferInMeters - I nearly changed this myself. I also agree wrt to dwc:coordinatePrecision.

I would change EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority service was not available; to EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority service(s) were not available;

I wasn't entirely sure about your c). until I drew several diagrams. as I interpret it, you are saying that the coordinates plus the uncertainty still don't touch the buffered border it fails, but if the coordinates are quite a way outside the boundary, but that the uncertainty circle just touches the buffered boundary it is OK and if it is unambiguous, an amendment can be made. I can agree with that - and it has to be unambiguous, because in many areas, the uncertainty may cause it to touch more than one buffered country. Do we not cause a bit of a quandry here though - and cause fewer successful amendments where coordinates fall easily within one country, but where the very tip of the uncertainty just touches the buffered border of a second country - or can you say that it is unambiguous (but can you do that without some sort of human intervention?)

tucotuco commented 4 years ago

Let me give my own opposing opinion. Since there is always the possibiity that you'll get the wrong answer, the test can't be 100% trusted, and users would need to understand why. If that's the case why not put all of the burden on the data instead of the analysis, which would actually be computationally expensive. In other words, no buffers, no uncertainties, no datums. If the coordinates fall inside a feature, fill in with that feature, otherwise don't. But explain very well why it can go wrong so the buyer can beware.

chicoreus commented 4 years ago

I view the primary purpose of this set of test of coordinate and country code as: to find and deal with (1) gross errors in coordinates (e.g. transposition, wrong signs, decimal point in wrong place) and (2) errors in the text of the country code. For the validation, there will be many false positives if we don't exclude from the test points that fall near (to some value of near including fuzzyness of the coordinate and fuzzyness of the spatial shapes) country boundaries. The fundamental purpose of the validation should not be entangled in issues of the accuracy and precicsion of georeferences near country boundaries. For that reason, I wouldn't want to apply @tucotuco's argument to the related validation. For this amendment, for there to be a change proposed to the data, the inconsistency isn't there to be checked, just an assertion to be made about country code. For specimens this indicates the country of origin of the material, and has legal implications. One approach would be @tucotuco's of assign a value whenever possible and let the consumer of the data beware. Another approach would be to propose the amendment only when a high value of certainty exists to the assignment, the point lies more than the buffer distance within the bounds of the country + EEZ (we would need care in the phrasing so that within the buffer distance of the maritime boundary of a country isn't excluded), and the uncertainty lies entirely within the country + EEZ bounds (or to be more conservative, the uncertainty doesn't overlap the buffer distance). This would prevent country code FILLED_IN assertions from being made based on points near a terrestrial country boundary, near a EEZ boundary, and for the most conservative case for points well within countries with a large uncertainty that extends beyond the country boundary. A case for requiring the uncertainty to fall entirely within the country lies in C shaped country edges, where the uncertainty lies mostly in the country on the (left) of the C, but the point ends up outside it in the country inside the C - cases where the point is definitely not the right location, though the uncertainty is a good representation of the location. I think I want to be fairly conservative on this one any say we should assert the country code only if the point lies within the shape of a country +EEZ minus the buffer distance, and the uncertainty lies entirely within the shape of the country + EEZ (but can extend into the buffer). This leaves cases where there is uncertainty as not filled in (marking them as unsuitable for certain uses (such as shipping a specimen across international boundaries without more research on the origin)), but also leaves downstream consumers to make less conservative assumptions about possible country assignments.

chicoreus commented 4 years ago

I also concur with s/Meters/InMeters/ and with not including coordinatePrecision. This test shouldn't be entangled with understanding if coordinateUncertaintyInMeters correctly included the precision of the coordinates.

ArthurChapman commented 4 years ago

I think I like John's simple approach - but are we not also missing the issue of the "country" that may be cited in the verbatim locality description. I can't remember, but we may have added that in prior to this test being run. I am getting totally confused. This may be another case where we want tests done in the order of a workflow?

Secondly - do Country Codes cover the EEZ? - if not, then all reference to the EEZ should probably be removed from this test. As far as I can see marineregions.org includes country codes into their EEZ layer, but I can't see on the ISO site that Country Codes cover more than "countries, dependent territories, and special areas of geographical interest" - there is no reference to the EEZ. So if we use this test, we are extending the definition of country codes to include the EEZ as in Marineregions.org.

Do we wish to include the EEZ in this test - and if so - should we include it in other related tests such as #118, #50, #48 and #21. I notice in most of those we don't mention the EEZ when determining the country code?

Tasilee commented 4 years ago

I missed @tucotuco 's last comment. I agree.

This morning, while running and thinking about this amendment, I was wondering if we should have it as CORE. If it is so complex to describe, then it fails on promoting widespread implementation. If we have to use spatial buffers (on country, EEZ and lat/long) then we are buying into some extremely complex logic and implementation issues. If Arthur (diagrams :) and I struggle with the logic, it is a worry.

Independently of @tucotuco - my thought was we simply use GADM and lat/long as note the potential (false positive) issues in NOTES. The only concern I have, given this is an amendment, is 'do no harm'.

ArthurChapman commented 4 years ago

I think it is CORE as it is a critical issue and one done by CRIA and others in the DQ. But I think perhaps we adopt John's simplified version with some notes - the notes don't have be quite as sophisticated as they would in the Response - just saying that due to coordinateUncertainty and the scale of many of the country layers - if the coordinates fall within the buffered boundaries, then make the amendment.

That still doesn't answer the EEZ question though.

ArthurChapman commented 4 years ago

After discussion with @Tasilee this morning, I think I am nearly inclined to go the @tucotuco brief way (if the coordinates fall within the country boundary as described by GADM then accept the amendment - otherwise not - forget the buffer even (although I shudder at the thought) because you probably need a GIS to apply the 3km buffer to the GADM Boundaries). This would mean one less parameter. Of course, you may end up with some false positives/negatives but only where records fall close to the country borders - or in the ocean near an Island, etc.

tucotuco commented 4 years ago

The MARBOUND version 11 shape files include the ISO country codes in the attributes, so they can in principle be used in conjunction with GADM to determine country codes. See http://www.marineregions.org/news.php?p=show&id=8107

I am in favor of keeping this as core, and of using only the projected coordinates (so dwc:geodeticDatum still figures after amandment in a workflow) without buffers or uncertainties, and explanations of the possible false positives in Notes.

Tasilee commented 4 years ago

OK, do we have consensus? I don't think I am capable to doing the Expected response and Notes but maybe I should have a go at it as it needs to be simple :)

Tasilee commented 4 years ago

Please review what I have done to the Expected response and Notes. I have probably pruned it to a minimalist position.

chicoreus commented 4 years ago

Buffer is really needed here. This one falls in the category of an error can put someone in jail. A false placement into a country due to insufficient precision of the country border could have significant implications. Much better to leave these edge cases with a blank country code, this marks them as needing more careful examination than this test can provide.

chicoreus commented 4 years ago

For the purpose of shipment of biological material across international boundaries, material collected from within the EEZ of a country has that country as the country of origin, so yes, country boundary plus EEZ determines the country code.

ArthurChapman commented 4 years ago

@Tasilee I think in INTERNAL_PREREQUISITES_NOT_MET - you mean CountryCode is not EMPTY. It has to be empty for this to work. @chicoreus - I am trying to think how can you include the buffering without having to resort to a GIS - it can't be done just using GADM can it? I would support including it if it can be easily done - otherwise ...

Tasilee commented 4 years ago

@ArthurChapman : Indeed, you are right. Amended.

I also agree with @ArthurChapman regards reliance on a 'GIS' unless the functionality was built into the bdq:sourceAuthority

tucotuco commented 4 years ago

The point in polygon calculation is best left to a GIS function in any case. I don't think that is an argument one way or the other about a buffer. Nor do I think the legal argument holds, since the buffer doesn't guarantee fidelity either. The original georeference could be at fault. Thus, any assertion under this test should have disclaimers. If potential legal ramifications of the misuse of the amended data are to be taken into account, we really should not be promoting this amendment at all.

On Fri, May 8, 2020 at 12:54 AM Lee Belbin notifications@github.com wrote:

@ArthurChapman https://github.com/ArthurChapman : Indeed, you are right. Amended.

I also agree with @ArthurChapman https://github.com/ArthurChapman regards reliance on a 'GIS' unless the functionality was built into the bdq:sourceAuthority

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tdwg/bdq/issues/73#issuecomment-625615972, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADQ7265BFU5E3TYPLFLES3RQN7ADANCNFSM4EKSNA7Q .