Open iDigBioBot opened 6 years ago
TestField | Value |
---|---|
GUID | c5658b83-4471-4f57-9d94-bf7d0a96900c |
Label | AMENDMENT_MINDEPTHMAXDEPTH_FROM_VERBATIM |
Description | Proposes amendments of the values of dwc:minimumDepthInMeters and dwc:maximumDepthInMeters if they can be interpreted from dwc:verbatimDepth. |
TestType | Amendment |
Darwin Core Class | dcterms:Location |
Information Elements ActedUpon | dwc:minimumDepthInMeters |
dwc:maximumDepthInMeters | |
Information Elements Consulted | dwc:verbatimDepth |
Expected Response | INTERNAL_PREREQUISITES_NOT_MET if dwc:minimumDepthInMeters or dwc:maximumDepthInMeters are bdq:NotEmpty or dwc:verbatimDepth is bdq:Empty; FILLED_IN the value of dwc:minimumDepthInMeters and dwc:maximumDepthInMeters if they can be unambiguously interpreted from dwc:verbatimDepth; otherwise NOT_AMENDED. |
Data Quality Dimension | Completeness |
Term-Actions | MINDEPTHMAXDEPTH_FROM_VERBATIM |
Parameter(s) | |
Source Authority | |
Specification Last Updated | 2024-08-30 |
Examples | [dwc:minimumDepthInMeters="", dwc:maximumDepthInMeters="", dwc:verbatimDepth="10 feet": Response.status=FILLED_IN, Response.result=dwc:minimumDepthInMeters="3.048", dwc:maximumDepthInMeters="3.048", Response.comment="dwc:verbatimDepth contains interpretable values"] |
[ dwc:minimumDepthInMeters="", dwc:maximumDepthInMeters="", dwc:verbatimDepth="x": Response.status=NOT_AMENDED, Response.result=, Response.comment="dwc:verbatimDepth does not contain an interpretable value"] | |
Source | |
References |
|
Example Implementations (Mechanisms) | |
Link to Specification Source Code | |
Notes | If dwc:verbatimDepth has a single value rather than a range, the minimum and maximum values should be amended with the same value. When transforming units, the transformation should be reversible, not adjusting the number of significant digits or adjusting the rounding. For example, transform fathoms to meters by multiplying by 1.8288 and retaining added significant digits (verbatim depth of 10 fathoms to minimum and maximum depths in meters of 18.288). Implementations should be capable of interpreting verbatim data in at least meters, fathoms, and feet, in the form of either a single value or a range. The units must be specified in the verbatim data to be interpretable. |
Established general principle in discussion: When doing an amendment that performs a transformation, do so in a manner which is reversible (thus transform fathoms to meters with a different number of significant digits in the example).
Shouldn’t we add as a Prerequisite that dwc:minimumDepthInMeters and dwc:maximumDepthInMeters are both EMPTY?
Changed "AMENDED" to "FILLED_IN" in accordance with discussions April 16.
Splitting bdqffdq:Information Elements into "Information Elements ActedUpon" and "Information Elements Consulted". Also changed "Field" to "TestField" and "Output Type" to "TestType".
The form of this specification doesn't match more recent ones.
Propose reworing the dwc:minimumDepthInMeters and dwc:maximumDepthInMeters are not EMPTY clause to produce NOT_AMENDED if either are populated (consistent with our more recent handling of FILLED_IN amendments).
Also, the Notes indicate that both min and max values should be set together, so make and/or just and. Thus changing from:
INTERNAL_PREREQUISITES_NOT_MET if dwc:verbatimDepth is EMPTY or the value is not unambiguously interpretable or dwc:minimumDepthInMeters and dwc:maximumDepthInMeters are not EMPTY; FILLED_IN the value of dwc:minimumDepthInMeters and/or dwc:maximumDepthInMeters if they could be unambiguously determined from dwc:verbatimDepth; otherwise NOT_AMENDED
to
INTERNAL_PREREQUISITES_NOT_MET if dwc:minimumDepthInMeters and dwc:maximumDepthInMeters are EMPTY and either dwc:verbatimDepth is EMPTY or the value is not unambiguously interpretable; FILLED_IN the value of dwc:minimumDepthInMeters and dwc:maximumDepthInMeters if they are not EMPTY and could be unambiguously determined from dwc:verbatimDepth; otherwise NOT_AMENDED.
The comment https://github.com/tdwg/bdq/issues/55#issuecomment-358125924 should be reflected in the notes.
Something like:
If the dwc:verbatimDepth has a single value rather than a range, the minimum and maximum values should be amended with the same value. When transforming units, the transformation should be reversable, not adjusting the number of significant digits or adjusting the rounding. For example, transform fathoms to meters by multiplying by 1.8288 and retaining added significant digits (verbatim depth of 10 fathoms to minimum and maximum depths in meters of 1̅8.288).
If we can include the combining overline symbol (U+0305), then we can provide both reversibility and an indication of the number of significant digits (which will take calculation). This won't quite correctly represent the introduced error (10 meters being 10 meters +/- 0.5 meter, 18 fathoms being 18 +/- 0.5 fathom). Adding this character will probably add complexity for downstream consumers, but as we are moving in and out of darwin core as strings, it is a possibility.
@chicoreus - this doesn't look right to me. Your new wording is suggesting filling in dwc:minimumDepthInMeters and dwc:maximumDepthInMeters if they are not EMPTY - BUT if they are not EMPTY - you don't want to replace what is there.
I think it should be
INTERNAL_PREREQUISITES_NOT_MET if dwc:minimumDepthInMeters and dwc:maximumDepthInMeters are not EMPTY and either dwc:verbatimDepth is EMPTY or the value is not unambiguously interpretable; FILLED_IN the value of dwc:minimumDepthInMeters and dwc:maximumDepthInMeters if they are EMPTY and could be unambiguously determined from dwc:verbatimDepth; otherwise NOT_AMENDED.
With this one and #68 - this issue of "and" or "and/or" in the first line. If the verbatimDepth only has a value for depth - you don't want anything filled in if either dwc:minimumDepthInMeters and dwc:maximumDepthInMeters have a value
This issue arises where the verbatimDepth says either "minimumDepth" has a value or "maximumDepth" has a value - then you could fill in the dwc:minimumDepthInMeters and dwc:maximumDepthInMeters respectively as long as the relevant didn't already have a value. I guess that could be covered by the wording "unambiguously interpretable"
On Tue, 30 Jul 2024 15:43:57 -0700 Arthur Chapman @.***> wrote:
INTERNAL_PREREQUISITES_NOT_MET if dwc:minimumDepthInMeters and dwc:maximumDepthInMeters are not EMPTY
This is not desirable, or consistent with other tests. If either minimum or maximum depth in meters contain a value, then the response.status should be NOT_AMENDED.
We should also note, when converting from fathoms to meters, use meters = fathoms 1.8288 rather than meters = fathoms 1.828804. The 1.828804 is for one fathom = 6 US State Plane Feet. This will introduce a trivial error on US chart data older than the most recent charts, but for biodiversity purposes, not relevant.
@ArthurChapman good catch. There is an extraneous not in there. It should read:
INTERNAL_PREREQUISITES_NOT_MET if dwc:minimumDepthInMeters and dwc:maximumDepthInMeters are EMPTY and either dwc:verbatimDepth is EMPTY or the value is not unambiguously interpretable; FILLED_IN the value of dwc:minimumDepthInMeters and dwc:maximumDepthInMeters if they are EMPTY and could be unambiguously determined from dwc:verbatimDepth; otherwise NOT_AMENDED.
And for the notes, another suggestion to augment:
If the dwc:verbatimDepth has a single value rather than a range, the minimum and maximum values should be amended with the same value. When transforming units, the transformation should be reversible, not adjusting the number of significant digits or adjusting the rounding. For example, transform fathoms to meters by multiplying by 1.8288 and retaining added significant digits (verbatim depth of 10 fathoms to minimum and maximum depths in meters of 1̅8.288). Implementations should be capable of interpreting verbatim data in at least meters, fathoms, and feet, in the form of either a single value or a range. The units must be specified in the verbatim data to be interpretable.
Expected Response and Notes updated following discussion above, and Specification Last Updated - updated.
Discussion of handling of validation cases in the form verbatmDepth="Min depth = 10m" and data observed in the wild in the form verbatimDepth="<10m" and verbatimDepth=">10m" leads to the suggestion that we change the specification to handle cases where only one bound is specified differently:
Thus interpreting verbatimDepth="<10m" as maximumDepthInMeters=10m, with no value in minimumDepthInMeters and verbatimDepth>10m as minimumDepthInMeters=10m with no value in maximumDepthInMeters
Current interpretation of the geo_ref_qc implementation is that verbatimDepth="<10m" is interpreted as minimum DepthInMeters=0, maximumDepthInMeters=10, while verbatimDepth=">10m" is treated as uninterpretable as it does not provide for an interpretable maximum bound.
I think it is useful, and reasonable, to provide and amendment to one of the two values while leaving the other not amended.
Would the following change to the Expected Response simplify and allow for evolution?
INTERNAL_PREREQUISITES_NOT_MET if dwc:verbatimDepth is EMPTY or the value is not unambiguously interpretable; FILLED_IN the value of dwc:minimumDepthInMeters and dwc:maximumDepthInMeters if they are EMPTY and could be unambiguously determined from dwc:verbatimDepth; otherwise NOT_AMENDED.
@Tasilee - this looks reasonable to me and simpler to what I wrote. Does this satisfy all your issues @chicoreus?
No comments on my suggested Expected Response? I think it will work with scenarios that have been mentioned.
My only query would be if, for example dwc:verbatimDepth="100m", do we AMEND dwc:minimumDepthInMeters, or dwc:maximumDepthInMeters, or neither?
I should add this example as an edge case in the Test Data.
The same argument will apply to #68
Amend both, no?
Thanks @tucotuco , hmm, yes. Why not.
My Expected Response will still work I guess, so changing
INTERNAL_PREREQUISITES_NOT_MET if dwc:minimumDepthInMeters and dwc:maximumDepthInMeters are EMPTY and either dwc:verbatimDepth is EMPTY or the value is not unambiguously interpretable; FILLED_IN the value of dwc:minimumDepthInMeters and dwc:maximumDepthInMeters if they are EMPTY and could be unambiguously determined from dwc:verbatimDepth; otherwise NOT_AMENDED.
to
INTERNAL_PREREQUISITES_NOT_MET if dwc:verbatimDepth is EMPTY or the value is not unambiguously interpretable; FILLED_IN the value of dwc:minimumDepthInMeters and dwc:maximumDepthInMeters if they are EMPTY and could be unambiguously determined from dwc:verbatimDepth; otherwise NOT_AMENDED.
and updated Specification Last Updated.
@chicoreus - your amended Notes crashed my Python dump program: "UnicodeEncodeError: 'charmap' codec can't encode character '\u0305'" so edited out, unless there was some hidden meaning?
Text @Tasilee replaced was:
If dwc:verbatimDepth has a single value rather than a range, the minimum and maximum values should be amended with the same value. When transforming units, the transformation should be reversible, not adjusting the number of significant digits or adjusting the rounding. For example, transform fathoms to meters by multiplying by 1.8288 and retaining added significant digits (verbatim depth of 10 fathoms to minimum and maximum depths in meters of 1̅8.288). Implementations should be capable of interpreting verbatim data in at least meters, fathoms, and feet, in the form of either a single value or a range. The units must be specified in the verbatim data to be interpretable.
The u0305 is the combining overline (overhead bar) character, indicating when over a digit not at the right most end of a number the number of significant digits (or if over the rightmost digit, indicating that digit is repeating to infinity (e.g 1.33̅3)). Thus the example is suggesting converting 10 fathoms to 18.288 meters, as a reversible transformation, but with the number of significant digits (two) retained by including the overhead bar, thus: 1̅8.288 See the notation in: https://en.wikipedia.org/wiki/Significant_figures#Multiplication_and_division
This needs discussion. One alternative is to convert 10 fathoms to 18 meters, retaining the two significant digits, but not being reversible. Another alternative is to convert 10 fathoms to 18.288 meters, reversible, but implying additional precision. Another alternative would be to convert 10 fathoms to the range 9.5 to 10.5 fathoms and from there to minDepthInMeters=17.3736 maxDepthInMeters=19.2024. The removed text is making the proposal that if we allow for the overhead bar (standard notation for significant digits), then we could have things both ways, a reversible transformation and retention of the number of significant digits.
As with @Tasilee 's conversion script, adding the overhead bar unicode character is likely to cause problems for downstream consumers of the amended data, as this isn't a typical convention in biodiversity informatics data. Each of the possibilities has tradeoffs, and we should have a good rationale for what decision we make. At this point, we've asserted that reversibility is an important principle for amendments that make numeric transformations.
On Thu, 01 Aug 2024 15:56:50 -0700 Lee Belbin @.***> wrote:
INTERNAL_PREREQUISITES_NOT_MET if dwc:verbatimDepth is EMPTY or the value is not unambiguously interpretable;
We should make sure that we are consistent in how amendments handle uninterpretable or ambiguous non-empty values. I think most of the time these fall into NOT_AMENDED rather than internal prerequsiites not met.
From the moment I first encountered significant digits I did not believe in them. I still don't. They are a great way to propagate and augment uncertainty. If that is what one is after, by all means... By the way, 10 has one significant digit, so the significant digit-based amendment would be to 20, not to 18. That is outside the upper bound of the conservative range method (9.5 - 10.5). Horrible.
In the absence of uncertainty measures for depth I am in favor of recording the exact result of the transformation, because it is reversible, and because it is supported by the original verbatim value, which would hopefully accompany the amended values.
On Sat, 03 Aug 2024 17:40:56 -0700 John Wieczorek @.***> wrote:
In the absence of uncertainty measures for depth I am in favor of recording the exact result of the transformation, because it is reversible, and because it is supported by the original verbatim value, which would hopefully accompany the amended values
I like this position. Key element is that we have no means for representing uncertanty in the minimum and maximum depth values.
Changed INTERNAL_PREREQUISITES_NOT_MET if dwc:verbatimDepth is EMPTY or the value is not unambiguously interpretable; FILLED_IN the value of dwc:minimumDepthInMeters and dwc:maximumDepthInMeters if they are EMPTY and could be unambiguously determined from dwc:verbatimDepth; otherwise NOT_AMENDED.
To INTERNAL_PREREQUISITES_NOT_MET if dwc:minimumDepthInMeters or dwc:maximumDepthInMeters are not EMPTY or dwc:verbatimDepth is EMPTY; FILLED_IN the value of dwc:minimumDepthInMeters and dwc:maximumDepthInMeters if they could be unambiguously determined from dwc:verbatimDepth; otherwise NOT_AMENDED.
and updated Specification Last Updated
Changed "or" to "and" in the INTERNAL_PREREQUISITES_NOT_MET to allow for either latitude or longitude to be filled in independantly
updated Specification Last Updated
Rationalle for @ArthurChapman's choice of "and" instead of "or" is that if one of multiple targets to be filled in contains a value, but others do not, then a test proposing the amendment needs to evaluate the consistency of the conclusion with the value that exists in the target with the source value that is being considered. For example if minimumDepthInMeters=6.4, maximumDepthInMeters="", and verbatimDepth="3.5-10 fathoms", the test must assess the consistency of 3.5 fathoms with 6.4 meters as well as filling in 10 fathoms as the maximum. This mixes concerns of validation of internal consistency with concerns about amending to fill in values. This was more obvious to us when examining filling in coordinates from verbatim values.
I agree with "INTERNAL_PREREQUISITES_NOT_MET if dwc:minimumDepthInMeters or dwc:maximumDepthInMeters are NOT EMPTY". This simplifies the implementation, no mixing and matching.
As has been reiterated, it is the "and" in "FILLED_IN the value of dwc:minimumDepthInMeters and dwc:maximumDepthInMeters if they can be unambiguously interpreted.." that raises issues with some dwc:verbatim text. If we have a single candidate value for dwc:minimumDepthInMeters or dwc:maximumDepthInMeters for example, dwc:verbatim="min depth=100m" or "dwc:verbatim="max depth=100m" do we fill the single candidate. Currently this scenario will produce NOT_AMENDED.
From an email thread on this issue:
NOT_AMENDED is an appropriate response here (not necessarily the easiest to implement). We can't unambiguously interpret the minimum and maximum depths from just a maximum depth in verbatim depth.
From verbatimDepth="Maximum depth 100m":
We could interpret any of the following three possibilities:
(1) minimumDepth=0, maximumDepth=100 (2) minimumDepth=100, maximumDepth=100 (3) minimumDepth=?, maximumDepth=100
The presence of "and" in the expected response means we need to provide both values, thus we can't (without changing the specification) return just maximumDepth=100 without a minimumDepth.
Since there is more than one possible interpretation, we can't provide both unambiguously, and the NOT_AMENDED response would fit with the current specification.
Note that the explicit setting of a lower bound differs from verbatimDepth=100m, which would be interpreted as maximumDepthInMeters=100, minimumDetphInMeters=100.
Removing hyphen from name to make TERM_ACTON consistent.
Could I suggest a slightly more flexible Expected Response given the edge case on current DataID 321
dwc:minimumDepthInMeters="", dwc:maximumDepthInMeters="", dwc:verbatimDepth="Maximum depth 100m"
INTERNAL_PREREQUISITES_NOT_MET if dwc:minimumDepthInMeters or dwc:maximumDepthInMeters are bdq:NotEmpty or dwc:verbatimDepth is bdq:Empty; FILLED_IN the value of dwc:minimumDepthInMeters and/or dwc:maximumDepthInMeters if they can be unambiguously interpreted from dwc:verbatimDepth; otherwise NOT_AMENDED.
This would result in
dwc:minimumDepthInMeters="", dwc:maximumDepthInMeters="100" ?
Seems reasonable to me
On Mon, 23 Sep 2024 22:47:01 -0700 Lee Belbin @.***> wrote:
Could I suggest a slightly more flexible Expected Response given the edge case on current DataID 321
dwc:minimumDepthInMeters="", dwc:maximumDepthInMeters="", dwc:verbatimDepth="Maximum depth 100m"
INTERNAL_PREREQUISITES_NOT_MET if dwc:minimumDepthInMeters or dwc:maximumDepthInMeters are bdq:NotEmpty or dwc:verbatimDepth is bdq:Empty; FILLED_IN the value of dwc:minimumDepthInMeters and/or dwc:maximumDepthInMeters if they can be unambiguously interpreted from dwc:verbatimDepth; otherwise NOT_AMENDED.
I'd rather not. It is likely to be confusing given the guidance in the notes about how to handle a single value: "If dwc:verbatimDepth has a single value rather than a range, the minimum and maximum values should be amended with the same value." Verbatim data in the form "min depth=, max depth=10" is very edge case. It is rare to specify a depth range with one explicit unknown end point. Also, we would want to make the paralell amendment for elevation consistent. Feels like very little gain at the cost of substatntial confusion.
@chicoreus. The Expected Response/Specification (if clear and concise) must inform the Notes, not the reverse. That said, I will bow to your position to move this along, and edit the Test Data accordingly.
The notes help explain the rationale for the expected response, and explain why approaching it differently would cause confusion... Yes, the expected response has primacy, but changing it likely to cause confusion and only has utility for a very edge case.