Open Tasilee opened 5 years ago
Added terms: ISSUE, NO_ISSUE, _POTENTIAL_ISSUE to table with no definitions - ready for export to Test Data V14 for discussion.
For EXTERNAL_PREREQUISITES_NOT_MET, we'll need some guidance for implementors around local caches for vocabularies. Perhaps guidance that a check for an external vocabulary may be cached, and at the start of a run of tests, the implementation should check that the source is available, and report EXTERNAL_PREREQUISITES_NOT_MET if it is not available at that time, or else give some (relatively short, no more than a few days) period where a cache may be used for an external vocabulary for a run of tests, or perhaps longer.
Updated some definitions following ZOOM discussion on 2022-04-30. Added two new terms: bdq:spatialBufferInMeters and Test Cristeria. Deleted a few redundant terms we are no longer using in the tests. To be continued following further ZOOM discussions.
Further updates following ZOOM meeting of 2022-04-09. Modified terms up to and including GUID (except for CORE which needs revisiting). Some terms deleted as no longer in use.
After many ZOOM meetings, we have agreed on the definitions in the Vocabulary and all have been updated as of today. There is more work to be done (especially in the third and fourth columns) and I am sure we will revisit a few definitions from time to time
Modified "Data Quality Report" definition and added definition for "Assertion".
Added two new terms to the Vocabulary. They may need rewording.
| Roman Numerals | Roman numerals are interpreted for months (e.g. "X" as "10") in appropriate tests. They are not interpreted for days as "x", "X", etc. can not be unambiguously interpreted as they may mean unknown. | Data |
| White space | 1) A field that only includes white space (blanks) is treated as EMPTY (q.v.). 2) In VALIDATION tests (q.v.) that require the looking up a Source Authority (q.v.), leading and/or trailing white space will cause the test to fail as no preprocessing is carried out on the data. These leading and trailing white spaces may be stripped out in a subsequent AMENDMENT (q.v.). and thus pass when the VALIDATION test is run again. | Data |
Thanks Arthur
Can I suggest....
| Roman Numerals | Roman numerals are interpreted as the equivalent integer for months (e.g. "X" as "10") in appropriate tests. Roman numerals may not be unambiguously interpreted for other Darwin Core terms such as dwc:day or in text fields as they may mean unknown or something else entirely. | Data |
Lee
On Mon, 12 Dec 2022 at 10:54, Arthur Chapman @.***> wrote:
Added two new terms to the Vocabulary. They may need rewording.
| Roman Numerals | Roman numerals are interpreted for months (e.g. "X" as "10") in appropriate tests. They are not interpreted for days as "x", "X", etc. can not be unambiguously interpreted as they may mean unknown. | Data |
| White space | 1) A field that only includes white space (blanks) is treated as EMPTY (q.v.). 2) In VALIDATION tests (q.v.) that require the looking up a Source Authority (q.v.), leading and/or trailing white space will cause the test to fail as no preprocessing is carried out on the data. These leading and trailing white spaces may be stripped out in a subsequent AMENDMENT (q.v.). and thus pass when the VALIDATION test is run again. | Data |
— Reply to this email directly, view it on GitHub https://github.com/tdwg/bdq/issues/152#issuecomment-1345694800, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABSZCXW6XVHWD7BOCLJBH63WMZSUDANCNFSM4FSCLLFQ . You are receiving this because you were mentioned.Message ID: @.***>
--
Lee Belbin 0419 374 133
I have included: | bdq:includeEventDate | Allows dwc:eventDate to be excluded in a parameterized test. The default is to include the event date in the test, but it may be excluded to allow an identification to be prior to the event date. | Parameter | Used in test #76 VALIDATION_DATEIDENTIFIED_INRANGE (dc8aae4b-134f-4d75-8a71-c4186239178e) |
Note - we need to decide how we wish to cross reference tests in the comments - the above is one suggestion.
We'll have to refer to the test GUIDs eventually rather than the issue numbers, no?
So, is there agreement on how we do this? Maybe as NAME:Guid so we have human and machine readable references? For example
VALIDATION_DATEIDENTIFIED_INRANGE:dc8aae4b-134f-4d75-8a71-c4186239178e
or using brackets etc?
Looks like Arthur has made an edit to refer to the test name and it's GUID, "VALIDATION_DATEIDENTIFIED_INRANGE (dc8aae4b-134f-4d75-8a71-c4186239178e)". To me that (the parenthetical identifier) nicely conveys that there are two ways to refer to it.
I think that is the neatest and simplest @tucotuco
OK, @ArthurChapman, could you then edit the links in the Vocabulary accordingly?
Done
I have begun checking the Vocabulary. Many minor changes will be made. I will only comment on new terms added etc. One thing I am checking is the CONTEXT and the COMMENTS. A discussion will be needed on COMMENTS once I have done a run through.
Added term: Test Type
Changed the formatting to make Test Types 'Upper Case'
We appear to have a discrepancy between the definitions of EMPTY and of non-printing characters
Under EMPTY we say "Note: A field containing non-printing or other invalid characters or values (including serializations of NULL values) are NOT_EMPTY and may be separately detected."
However under non-printing characters we say "For the purposes of the tests they are treated as EMPTY."
Should not the latter definition say "For the purposes of the tests they are treated as NOT_EMPTY." ?
Nope, as I remember @chicoreus saying that they would be treated as EMPTY and all the test data is setup for that.
OK - then I will change the definition of EMPTY
The definition of EMPTY (especially the Note) has been changed to the following
"A field that is either not present or does not contain any characters or values other than white space. Note: A field containing invalid characters or values (including serializations of NULL values) are NOT_EMPTY and may be separately detected but fields containing only non-printing characters (q.v.) are treated as EMPTY."
Following discussion with @Tasilee on how we treat issues such as bdq.minimumDepthInMeters - where it has been suggested that we add just "bdq.minimumValidDepthInMeters" in the Parameter(s) and "bdq.minimumValidDepthInMeters default = "0"" in the Source Authority to be consistent with what has been done in other parameterized tests where we use bdq:sourceAuthority. i.e. we treat namespace terms such as "bdq:minimumValidDepthInMeters" as an equivalent to "bdq:sourceAuthority" as Source Authorities.
Thus, I have changed the definition of Source Authority from:
A vocabulary or standard to use to look up a value in an Information Element (q.v.). See also bdq:sourceAuthority (q.v.).
to (wording added in BOLD):
A vocabulary or standard to use to look up a value (or a supplied numerical value in a parameterized test) in an Information Element (q.v.). See also bdq:sourceAuthority (q.v.).
Looks appropriate to me.
I have added two new values into the Vocabulary. I have given them the Context "Data" but I am not sure this is correct
epsg: | A pseudo-namespace referenced in dwc:datum to indicate the EPSG API where the numeric value following the colon is used as the search key. Example: epsg:4326. | Data |
gbif: | A pseudo-namespace referenced in dwc:taxonID to indicate the GBIF API where the numeric value following the colon is used as the search key. Example gbif:8102122. | Data |
I have added a new term into the Vocabulary
| bdq:defaultGeodeticDatum | Optionally established the default datum in a parameterized test (q.v.). A default datum is supplied in cases where a parameter (q.v.) is not set at the time the test is run. | Parameter | See test AMENDMENT_GEODETICDATUM_ASSUMEDDEFAULT (7498ca76-c4d4-42e2-8103-acacccbdffa7). |
I wouldn't use 'Optional'. Maybe...
If dwc:geodeticDatum is EMPTY, set the value to the default geodetic datum. | Parameter | See test AMENDMENT_GEODETICDATUM_ASSUMEDDEFAULT (7498ca76-c4d4-42e2-8103-acacccbdffa7). |
They all say Optionally. Your suggestion is covered in the Expected Response - the Vocabulary isn't where that goes.
Changed epsg: to EPSG: throughout document
Changed definition of EMPTY. See comment by @chicoreus https://github.com/tdwg/bdq/issues/111#issuecomment-1596304706
Added a new term in line with test #43
| bdq:targetCRS | The Coordinate Reference System (CRS) used as the output when converting coordinates from one CRS to another. The default is EPSG:4326. | Parameter | Used in the test AMENDMENT_COORDINATES_CONVERTED (620749b9-7d9c-4890-97d2-be3d1cde6da8) |
Added two new terms and deleted bdq:annotation (replaced by bdq:annotationAlertIf)
| | bdq:annotationAlertIf | Optionally establishes if an annotation exists within a bdq:annotationSystem (q.v.) by describing the criteria for relating annotations in the system to records in a Parameterized Test (q.v.)." | Parameter | Used in test "ANNOTATION_ISSUE_NOTEMPTY" (fecaa8a3-bbd8-4c5a-a424-13c37c4bb7b1). |
| | bdq.annotationSystem | Optionally established a system for annotations within a Parameterized Test (q.v.) with the default being the w3c Annotations Data Model's "oa:annotation" | Parameter | Used in test "ANNOTATION_ISSUE_NOTEMPTY" (fecaa8a3-bbd8-4c5a-a424-13c37c4bb7b1). |
Typo bdq.annotationSystem to bdq:annotationSystem
I'm unsure about oa:target
oa:target is part of the W3C ao:annotation
@Tasilee good catch, that should be oa:hasTarget https://www.w3.org/TR/annotation-vocab/#hastarget
The relevant terms are oa:Annotation http://www.w3.org/ns/oa#Annotation and oa:hasTarget http://www.w3.org/ns/oa#hasTarget I think previous iterations had a Target class, the W3C web annotation data model does not. https://www.w3.org/TR/annotation-vocab/
Updated #29 in accord with @chicoreus comments above. and changed ao:annotation to ao:Annotation in definition of bdq:annotationSystem
Should we separate out the square bracket sourceAuthority examples into separate entities. Now includes just under bdq:sourceAuthority
Good pickup @ArthurChapman: I'd say "Yes"
I thought there were 5 - there are only three now - and they occur in only two tests
bdq:sourceAuthority[countryshapes] in #73 bdq:sourceAuthority[geospatialland] and bdq:sourceAuthority[taxonomyismarine] in #51
If we do change these - I would suggest
bdq:sourceAuthority[countryshapes] --> bdq:countryShapes
bdq:sourceAuthority[geospatialland] --> bdq:geospatialLand
bdq:sourceAuthority[taxonomyismarine] --> bdq:taxonomyIsMarine
I think your suggestion is in line with there way we have been thinking
Following ZOOM discussion of 2023-07-03/04
Updated comment for AMENDMENT to read
Formally in the Fitness for Use Framework (Veiga et al.), the description of a test that can propose a change is an Enhancement, while the corresponding report level concept is an Amendment. Tests tagged as Amendments are Enhancements at the data quality needs level in the Framework.
Following ZOOM discussion of 2023-07-03/04
@chicoreus to look at the wording of Assertion, MEASURE and VALIDATION
Following ZOOM discussion of 2023-07-/03/04
Updated the comment for all the Data Quality Dimension terms (Completeness, Conformance, Consistency, Likeliness, Reliability, and Resolution) to
Definition from the Fitness for Use Framework: Data Quality Dimensions Document (Link needed to RDF document - https://github.com/tdwg/bdq/wiki/TG2-Data-Quality-Dimension)
NB we need to update these once we have aa formal link to the RDF Document on Data Quality Dimensions.
Following ZOOM discussion of 2023-07-/03/04
Comments added to Data Quality Dimension and Data Quality Report
"Link to OWL Document"
NB This link needs to be added once we have a final permanent link address @chicoreus
Following ZOOM discussion of 2023-07-/03/04
Comments with reference to the Fitness for Use Framework deleted for all the Warning Types viz,
Ambiguous (q.v.), Incomplete (q.v.), Inconsistent (q.v.), Invalid (q.v.), Unlikely (q.v.)
Following ZOOM discussion of 2023-07-/03/04
Deleted comments with reference to FFU Framework for as these were out of date.
IS_ISSUE POTENTIAL_ISSUE
Following ZOOM discussion of 2023-07-/03/04
For RUN_HAS_RESULT - changed comment from:
Applies to VALIDATIONS, MEASURES, and ISSUES, but not AMENDMENTS. See Fitness for Use Framework. Cf. INTERNAL_PREREQUISITES_NOT_MET (q.v.) and EXTERNAL_PREREQUISITES_NOT_MET (q.v.) |
to
Applies to VALIDATIONS, MEASURES, and ISSUES, but not AMENDMENTS. See Fitness for Use Framework definition in Need link to OWL Document. See also INTERNAL_PREREQUISITES_NOT_MET (q.v.) and EXTERNAL_PREREQUISITES_NOT_MET (q.v.) | See Fitness for Use Framework definition in Need link to OWL Document
@chicoreus.
Following ZOOM discussion of 2023-07-/03/04
Added a new term
| Specification | A technical description of the performed test upon which an implementation could be made. | Response | |
modified Expected Response to read
| Expected Response | A term used in place of Specification (q.v.) in the markdown of the tests in the bdq GitHub. | Response | |
deleted "Test Criteria" as is a discontinued term we use in the tests and which was later replaced by Expected Response.
Following ZOOM discussion of 2023-07-/03/04
Changed the comment for "Single Record" to
All the current tests are run on a single record and not designed to be run across multi records.
Context for "single record" changed to "Resource Type"
Terms in the bdqffdq namespace are from the Fitness for Use Framework (Viega et al. 2017). Use the reference to the Framework Definitions for more details and examples. The use of a vocabulary term in a test specification without a namespace prefix (sometimes represented in all UPPER CASE), implies that the bdq: or bdqffdq: namespace is applicable. Note that wherever "DQ" is used in a definition it implies "Data Quality" and wherever "FFU Framework" is used it refers to the "Fitness for Use Framework" (Veiga et al. 2017).
Supplement: GitHub Label Terms These are terms that are outside the Standard but that have been used as either GitHub Labels or TestFields in the BDQ GitHub