tdwg / bdq

Biodiversity Data Quality (BDQ) Interest Group
https://github.com/tdwg/bdq
43 stars 7 forks source link

TG2-AMENDMENT_MINDEPTHMAXDEPTH_FROM_VERBATIM #55

Open iDigBioBot opened 6 years ago

iDigBioBot commented 6 years ago
TestField Value
GUID c5658b83-4471-4f57-9d94-bf7d0a96900c
Label AMENDMENT_MINDEPTHMAXDEPTH_FROM_VERBATIM
Description Proposes amendments of the values of dwc:minimumDepthInMeters and dwc:maximumDepthInMeters if they can be interpreted from dwc:verbatimDepth.
TestType Amendment
Darwin Core Class dcterms:Location
Information Elements ActedUpon dwc:minimumDepthInMeters
dwc:maximumDepthInMeters
Information Elements Consulted dwc:verbatimDepth
Expected Response INTERNAL_PREREQUISITES_NOT_MET if dwc:minimumDepthInMeters or dwc:maximumDepthInMeters are bdq:NotEmpty or dwc:verbatimDepth is bdq:Empty; FILLED_IN the value of dwc:minimumDepthInMeters and dwc:maximumDepthInMeters if they can be unambiguously interpreted from dwc:verbatimDepth; otherwise NOT_AMENDED.
Data Quality Dimension Completeness
Term-Actions MINDEPTHMAXDEPTH_FROM_VERBATIM
Parameter(s)
Source Authority
Specification Last Updated 2024-08-30
Examples [dwc:minimumDepthInMeters="", dwc:maximumDepthInMeters="", dwc:verbatimDepth="10 feet": Response.status=FILLED_IN, Response.result=dwc:minimumDepthInMeters="3.048", dwc:maximumDepthInMeters="3.048", Response.comment="dwc:verbatimDepth contains interpretable values"]
[ dwc:minimumDepthInMeters="", dwc:maximumDepthInMeters="", dwc:verbatimDepth="x": Response.status=NOT_AMENDED, Response.result=, Response.comment="dwc:verbatimDepth does not contain an interpretable value"]
Source
References
Example Implementations (Mechanisms)
Link to Specification Source Code
Notes If dwc:verbatimDepth has a single value rather than a range, the minimum and maximum values should be amended with the same value. When transforming units, the transformation should be reversible, not adjusting the number of significant digits or adjusting the rounding. For example, transform fathoms to meters by multiplying by 1.8288 and retaining added significant digits (verbatim depth of 10 fathoms to minimum and maximum depths in meters of 18.288). Implementations should be capable of interpreting verbatim data in at least meters, fathoms, and feet, in the form of either a single value or a range. The units must be specified in the verbatim data to be interpretable.
chicoreus commented 6 years ago

Established general principle in discussion: When doing an amendment that performs a transformation, do so in a manner which is reversible (thus transform fathoms to meters with a different number of significant digits in the example).

ArthurChapman commented 6 years ago

Shouldn’t we add as a Prerequisite that dwc:minimumDepthInMeters and dwc:maximumDepthInMeters are both EMPTY?

Tasilee commented 2 years ago

Changed "AMENDED" to "FILLED_IN" in accordance with discussions April 16.

Tasilee commented 1 year ago

Splitting bdqffdq:Information Elements into "Information Elements ActedUpon" and "Information Elements Consulted". Also changed "Field" to "TestField" and "Output Type" to "TestType".

chicoreus commented 3 months ago

The form of this specification doesn't match more recent ones.

Propose reworing the dwc:minimumDepthInMeters and dwc:maximumDepthInMeters are not EMPTY clause to produce NOT_AMENDED if either are populated (consistent with our more recent handling of FILLED_IN amendments).

Also, the Notes indicate that both min and max values should be set together, so make and/or just and. Thus changing from:

INTERNAL_PREREQUISITES_NOT_MET if dwc:verbatimDepth is EMPTY or the value is not unambiguously interpretable or dwc:minimumDepthInMeters and dwc:maximumDepthInMeters are not EMPTY; FILLED_IN the value of dwc:minimumDepthInMeters and/or dwc:maximumDepthInMeters if they could be unambiguously determined from dwc:verbatimDepth; otherwise NOT_AMENDED

to

INTERNAL_PREREQUISITES_NOT_MET if dwc:minimumDepthInMeters and dwc:maximumDepthInMeters are EMPTY and either dwc:verbatimDepth is EMPTY or the value is not unambiguously interpretable; FILLED_IN the value of dwc:minimumDepthInMeters and dwc:maximumDepthInMeters if they are not EMPTY and could be unambiguously determined from dwc:verbatimDepth; otherwise NOT_AMENDED.

The comment https://github.com/tdwg/bdq/issues/55#issuecomment-358125924 should be reflected in the notes.

Something like:

If the dwc:verbatimDepth has a single value rather than a range, the minimum and maximum values should be amended with the same value. When transforming units, the transformation should be reversable, not adjusting the number of significant digits or adjusting the rounding. For example, transform fathoms to meters by multiplying by 1.8288 and retaining added significant digits (verbatim depth of 10 fathoms to minimum and maximum depths in meters of 1̅8.288).

If we can include the combining overline symbol (U+0305), then we can provide both reversibility and an indication of the number of significant digits (which will take calculation). This won't quite correctly represent the introduced error (10 meters being 10 meters +/- 0.5 meter, 18 fathoms being 18 +/- 0.5 fathom). Adding this character will probably add complexity for downstream consumers, but as we are moving in and out of darwin core as strings, it is a possibility.

ArthurChapman commented 3 months ago

@chicoreus - this doesn't look right to me. Your new wording is suggesting filling in dwc:minimumDepthInMeters and dwc:maximumDepthInMeters if they are not EMPTY - BUT if they are not EMPTY - you don't want to replace what is there.

I think it should be

INTERNAL_PREREQUISITES_NOT_MET if dwc:minimumDepthInMeters and dwc:maximumDepthInMeters are not EMPTY and either dwc:verbatimDepth is EMPTY or the value is not unambiguously interpretable; FILLED_IN the value of dwc:minimumDepthInMeters and dwc:maximumDepthInMeters if they are EMPTY and could be unambiguously determined from dwc:verbatimDepth; otherwise NOT_AMENDED.

ArthurChapman commented 3 months ago

With this one and #68 - this issue of "and" or "and/or" in the first line. If the verbatimDepth only has a value for depth - you don't want anything filled in if either dwc:minimumDepthInMeters and dwc:maximumDepthInMeters have a value

This issue arises where the verbatimDepth says either "minimumDepth" has a value or "maximumDepth" has a value - then you could fill in the dwc:minimumDepthInMeters and dwc:maximumDepthInMeters respectively as long as the relevant didn't already have a value. I guess that could be covered by the wording "unambiguously interpretable"

chicoreus commented 3 months ago

On Tue, 30 Jul 2024 15:43:57 -0700 Arthur Chapman @.***> wrote:

INTERNAL_PREREQUISITES_NOT_MET if dwc:minimumDepthInMeters and dwc:maximumDepthInMeters are not EMPTY

This is not desirable, or consistent with other tests. If either minimum or maximum depth in meters contain a value, then the response.status should be NOT_AMENDED.

chicoreus commented 3 months ago

We should also note, when converting from fathoms to meters, use meters = fathoms 1.8288 rather than meters = fathoms 1.828804. The 1.828804 is for one fathom = 6 US State Plane Feet. This will introduce a trivial error on US chart data older than the most recent charts, but for biodiversity purposes, not relevant.

chicoreus commented 3 months ago

@ArthurChapman good catch. There is an extraneous not in there. It should read:

INTERNAL_PREREQUISITES_NOT_MET if dwc:minimumDepthInMeters and dwc:maximumDepthInMeters are EMPTY and either dwc:verbatimDepth is EMPTY or the value is not unambiguously interpretable; FILLED_IN the value of dwc:minimumDepthInMeters and dwc:maximumDepthInMeters if they are EMPTY and could be unambiguously determined from dwc:verbatimDepth; otherwise NOT_AMENDED.

chicoreus commented 3 months ago

And for the notes, another suggestion to augment:

If the dwc:verbatimDepth has a single value rather than a range, the minimum and maximum values should be amended with the same value. When transforming units, the transformation should be reversible, not adjusting the number of significant digits or adjusting the rounding. For example, transform fathoms to meters by multiplying by 1.8288 and retaining added significant digits (verbatim depth of 10 fathoms to minimum and maximum depths in meters of 1̅8.288). Implementations should be capable of interpreting verbatim data in at least meters, fathoms, and feet, in the form of either a single value or a range. The units must be specified in the verbatim data to be interpretable.

ArthurChapman commented 3 months ago

Expected Response and Notes updated following discussion above, and Specification Last Updated - updated.

chicoreus commented 3 months ago

Discussion of handling of validation cases in the form verbatmDepth="Min depth = 10m" and data observed in the wild in the form verbatimDepth="<10m" and verbatimDepth=">10m" leads to the suggestion that we change the specification to handle cases where only one bound is specified differently:

Thus interpreting verbatimDepth="<10m" as maximumDepthInMeters=10m, with no value in minimumDepthInMeters and verbatimDepth>10m as minimumDepthInMeters=10m with no value in maximumDepthInMeters

Current interpretation of the geo_ref_qc implementation is that verbatimDepth="<10m" is interpreted as minimum DepthInMeters=0, maximumDepthInMeters=10, while verbatimDepth=">10m" is treated as uninterpretable as it does not provide for an interpretable maximum bound.

tucotuco commented 3 months ago

I think it is useful, and reasonable, to provide and amendment to one of the two values while leaving the other not amended.

Tasilee commented 3 months ago

Would the following change to the Expected Response simplify and allow for evolution?

INTERNAL_PREREQUISITES_NOT_MET if dwc:verbatimDepth is EMPTY or the value is not unambiguously interpretable; FILLED_IN the value of dwc:minimumDepthInMeters and dwc:maximumDepthInMeters if they are EMPTY and could be unambiguously determined from dwc:verbatimDepth; otherwise NOT_AMENDED.

ArthurChapman commented 3 months ago

@Tasilee - this looks reasonable to me and simpler to what I wrote. Does this satisfy all your issues @chicoreus?

Tasilee commented 3 months ago

No comments on my suggested Expected Response? I think it will work with scenarios that have been mentioned.

My only query would be if, for example dwc:verbatimDepth="100m", do we AMEND dwc:minimumDepthInMeters, or dwc:maximumDepthInMeters, or neither?

I should add this example as an edge case in the Test Data.

The same argument will apply to #68

tucotuco commented 3 months ago

Amend both, no?

Tasilee commented 3 months ago

Thanks @tucotuco , hmm, yes. Why not.

My Expected Response will still work I guess, so changing

INTERNAL_PREREQUISITES_NOT_MET if dwc:minimumDepthInMeters and dwc:maximumDepthInMeters are EMPTY and either dwc:verbatimDepth is EMPTY or the value is not unambiguously interpretable; FILLED_IN the value of dwc:minimumDepthInMeters and dwc:maximumDepthInMeters if they are EMPTY and could be unambiguously determined from dwc:verbatimDepth; otherwise NOT_AMENDED.

to

INTERNAL_PREREQUISITES_NOT_MET if dwc:verbatimDepth is EMPTY or the value is not unambiguously interpretable; FILLED_IN the value of dwc:minimumDepthInMeters and dwc:maximumDepthInMeters if they are EMPTY and could be unambiguously determined from dwc:verbatimDepth; otherwise NOT_AMENDED.

and updated Specification Last Updated.

Tasilee commented 3 months ago

@chicoreus - your amended Notes crashed my Python dump program: "UnicodeEncodeError: 'charmap' codec can't encode character '\u0305'" so edited out, unless there was some hidden meaning?

chicoreus commented 3 months ago

Text @Tasilee replaced was:

If dwc:verbatimDepth has a single value rather than a range, the minimum and maximum values should be amended with the same value. When transforming units, the transformation should be reversible, not adjusting the number of significant digits or adjusting the rounding. For example, transform fathoms to meters by multiplying by 1.8288 and retaining added significant digits (verbatim depth of 10 fathoms to minimum and maximum depths in meters of 1̅8.288). Implementations should be capable of interpreting verbatim data in at least meters, fathoms, and feet, in the form of either a single value or a range. The units must be specified in the verbatim data to be interpretable.

The u0305 is the combining overline (overhead bar) character, indicating when over a digit not at the right most end of a number the number of significant digits (or if over the rightmost digit, indicating that digit is repeating to infinity (e.g 1.33̅3)). Thus the example is suggesting converting 10 fathoms to 18.288 meters, as a reversible transformation, but with the number of significant digits (two) retained by including the overhead bar, thus: 1̅8.288 See the notation in: https://en.wikipedia.org/wiki/Significant_figures#Multiplication_and_division

This needs discussion. One alternative is to convert 10 fathoms to 18 meters, retaining the two significant digits, but not being reversible. Another alternative is to convert 10 fathoms to 18.288 meters, reversible, but implying additional precision. Another alternative would be to convert 10 fathoms to the range 9.5 to 10.5 fathoms and from there to minDepthInMeters=17.3736 maxDepthInMeters=19.2024. The removed text is making the proposal that if we allow for the overhead bar (standard notation for significant digits), then we could have things both ways, a reversible transformation and retention of the number of significant digits.

As with @Tasilee 's conversion script, adding the overhead bar unicode character is likely to cause problems for downstream consumers of the amended data, as this isn't a typical convention in biodiversity informatics data. Each of the possibilities has tradeoffs, and we should have a good rationale for what decision we make. At this point, we've asserted that reversibility is an important principle for amendments that make numeric transformations.

chicoreus commented 3 months ago

On Thu, 01 Aug 2024 15:56:50 -0700 Lee Belbin @.***> wrote:

INTERNAL_PREREQUISITES_NOT_MET if dwc:verbatimDepth is EMPTY or the value is not unambiguously interpretable;

We should make sure that we are consistent in how amendments handle uninterpretable or ambiguous non-empty values. I think most of the time these fall into NOT_AMENDED rather than internal prerequsiites not met.

tucotuco commented 3 months ago

From the moment I first encountered significant digits I did not believe in them. I still don't. They are a great way to propagate and augment uncertainty. If that is what one is after, by all means... By the way, 10 has one significant digit, so the significant digit-based amendment would be to 20, not to 18. That is outside the upper bound of the conservative range method (9.5 - 10.5). Horrible.

In the absence of uncertainty measures for depth I am in favor of recording the exact result of the transformation, because it is reversible, and because it is supported by the original verbatim value, which would hopefully accompany the amended values.

chicoreus commented 3 months ago

On Sat, 03 Aug 2024 17:40:56 -0700 John Wieczorek @.***> wrote:

In the absence of uncertainty measures for depth I am in favor of recording the exact result of the transformation, because it is reversible, and because it is supported by the original verbatim value, which would hopefully accompany the amended values

I like this position. Key element is that we have no means for representing uncertanty in the minimum and maximum depth values.

ArthurChapman commented 2 months ago

Changed INTERNAL_PREREQUISITES_NOT_MET if dwc:verbatimDepth is EMPTY or the value is not unambiguously interpretable; FILLED_IN the value of dwc:minimumDepthInMeters and dwc:maximumDepthInMeters if they are EMPTY and could be unambiguously determined from dwc:verbatimDepth; otherwise NOT_AMENDED.

To INTERNAL_PREREQUISITES_NOT_MET if dwc:minimumDepthInMeters or dwc:maximumDepthInMeters are not EMPTY or dwc:verbatimDepth is EMPTY; FILLED_IN the value of dwc:minimumDepthInMeters and dwc:maximumDepthInMeters if they could be unambiguously determined from dwc:verbatimDepth; otherwise NOT_AMENDED.

and updated Specification Last Updated

ArthurChapman commented 2 months ago

Changed "or" to "and" in the INTERNAL_PREREQUISITES_NOT_MET to allow for either latitude or longitude to be filled in independantly

updated Specification Last Updated

chicoreus commented 2 months ago

Rationalle for @ArthurChapman's choice of "and" instead of "or" is that if one of multiple targets to be filled in contains a value, but others do not, then a test proposing the amendment needs to evaluate the consistency of the conclusion with the value that exists in the target with the source value that is being considered. For example if minimumDepthInMeters=6.4, maximumDepthInMeters="", and verbatimDepth="3.5-10 fathoms", the test must assess the consistency of 3.5 fathoms with 6.4 meters as well as filling in 10 fathoms as the maximum. This mixes concerns of validation of internal consistency with concerns about amending to fill in values. This was more obvious to us when examining filling in coordinates from verbatim values.

Tasilee commented 2 months ago

I agree with "INTERNAL_PREREQUISITES_NOT_MET if dwc:minimumDepthInMeters or dwc:maximumDepthInMeters are NOT EMPTY". This simplifies the implementation, no mixing and matching.

As has been reiterated, it is the "and" in "FILLED_IN the value of dwc:minimumDepthInMeters and dwc:maximumDepthInMeters if they can be unambiguously interpreted.." that raises issues with some dwc:verbatim text. If we have a single candidate value for dwc:minimumDepthInMeters or dwc:maximumDepthInMeters for example, dwc:verbatim="min depth=100m" or "dwc:verbatim="max depth=100m" do we fill the single candidate. Currently this scenario will produce NOT_AMENDED.

chicoreus commented 2 months ago

From an email thread on this issue:

NOT_AMENDED is an appropriate response here (not necessarily the easiest to implement). We can't unambiguously interpret the minimum and maximum depths from just a maximum depth in verbatim depth.

From verbatimDepth="Maximum depth 100m":

We could interpret any of the following three possibilities:

(1) minimumDepth=0, maximumDepth=100 (2) minimumDepth=100, maximumDepth=100 (3) minimumDepth=?, maximumDepth=100

The presence of "and" in the expected response means we need to provide both values, thus we can't (without changing the specification) return just maximumDepth=100 without a minimumDepth.

Since there is more than one possible interpretation, we can't provide both unambiguously, and the NOT_AMENDED response would fit with the current specification.

chicoreus commented 2 months ago

Note that the explicit setting of a lower bound differs from verbatimDepth=100m, which would be interpreted as maximumDepthInMeters=100, minimumDetphInMeters=100.

chicoreus commented 2 months ago

Removing hyphen from name to make TERM_ACTON consistent.

Tasilee commented 1 month ago

Could I suggest a slightly more flexible Expected Response given the edge case on current DataID 321

dwc:minimumDepthInMeters="", dwc:maximumDepthInMeters="", dwc:verbatimDepth="Maximum depth 100m"

INTERNAL_PREREQUISITES_NOT_MET if dwc:minimumDepthInMeters or dwc:maximumDepthInMeters are bdq:NotEmpty or dwc:verbatimDepth is bdq:Empty; FILLED_IN the value of dwc:minimumDepthInMeters and/or dwc:maximumDepthInMeters if they can be unambiguously interpreted from dwc:verbatimDepth; otherwise NOT_AMENDED.

This would result in

dwc:minimumDepthInMeters="", dwc:maximumDepthInMeters="100" ?

ArthurChapman commented 1 month ago

Seems reasonable to me

chicoreus commented 1 month ago

On Mon, 23 Sep 2024 22:47:01 -0700 Lee Belbin @.***> wrote:

Could I suggest a slightly more flexible Expected Response given the edge case on current DataID 321

dwc:minimumDepthInMeters="", dwc:maximumDepthInMeters="", dwc:verbatimDepth="Maximum depth 100m"

INTERNAL_PREREQUISITES_NOT_MET if dwc:minimumDepthInMeters or dwc:maximumDepthInMeters are bdq:NotEmpty or dwc:verbatimDepth is bdq:Empty; FILLED_IN the value of dwc:minimumDepthInMeters and/or dwc:maximumDepthInMeters if they can be unambiguously interpreted from dwc:verbatimDepth; otherwise NOT_AMENDED.

I'd rather not. It is likely to be confusing given the guidance in the notes about how to handle a single value: "If dwc:verbatimDepth has a single value rather than a range, the minimum and maximum values should be amended with the same value." Verbatim data in the form "min depth=, max depth=10" is very edge case. It is rare to specify a depth range with one explicit unknown end point. Also, we would want to make the paralell amendment for elevation consistent. Feels like very little gain at the cost of substatntial confusion.

Tasilee commented 1 month ago

@chicoreus. The Expected Response/Specification (if clear and concise) must inform the Notes, not the reverse. That said, I will bow to your position to move this along, and edit the Test Data accordingly.

chicoreus commented 1 month ago

The notes help explain the rationale for the expected response, and explain why approaching it differently would cause confusion... Yes, the expected response has primacy, but changing it likely to cause confusion and only has utility for a very edge case.