TG2 - Test Specifications and definitions

Tasilee commented 6 years ago

Field	Value
GUID	A globally unique identifier that would resolve to related information about the test and assertion, e.g., e39098df-ef46-464c-9aef-bcdeee2a88cb
Label	A standardized name of the test-assertion based on the template OUTPUTTYPE_TERMS_RESPONSE, e.g., "VALIDATION_BASISOFRECORD_NOTSTANDARD". These names were considered helpful for human-human communication and to assist with code implementation, maintenance and searches.
Description	A concise description of the AMENDMENT or MEASURE, e.g., The value of dcterms:license was standardized using the bdq:sourceAuthority
Output Type	All tests have been classified into four classes: VALIDATION (tests values in one or more Darwin Core terms and returns either “COMPLIANT” or “NOT_ COMPLIANT”, e.g., “VALIDATION_BASISOFRECORD_NOTSTANDARD” would return “COMPLIANT” if dwc:basisOfRecord=”Preserved specimen”); AMENDMENT (flag that a change has been made to at least one Darwin Core term in the record, e.g., “AMENDMENT_COORDINATES_TRANSPOSED” where dwc:decimalLatitude and dwc:decimalLongitude values have been reversed); NOTIFICATION (flags where a term is “NOT EMPTY”, e.g., "NOTIFICATION_DATAGENERALIZATIONS_NOTEMPTY" when dwc:dataGeneralized contains some value or text); MEASURE (returns a number of tests conforming to a criteria, e.g., “MEASURE_VALIDATIONTESTS_NOTCOMPLIANT” returns the number of tests of type VALIDATION that returned “NOT_COMPLIANT”)
Expected response	A concise description of the expected response in the form of INTERNAL_PREREQUESITES_NOTMET if there is no value or if a specified source such as a vocabulary is not available; if the test fails, it is NOT_COMPLIANT, otherwise it is COMPLIANT, e.g., VALIDATION_DAY_NOTSTANDARD: Expected response is INTERNAL_PREREQUISITES_NOTMET if there is no value for dwc:day; NOT_COMPLIANT if the value of dwc:day is not an integer between 1 and 31; otherwise COMPLIANT
Darwin Core Class	The Darwin Core Class that the test relates to, e.g., Taxon
Information Elements	The Darwin Core terms that the test relates to, e.g., dwc:taxonRank
Dimension	The focus of the Darwin Core terms used in the test, either NAME, SPACE, TIME or OTHER
Data Dimension	A test will focus on one of the following scenarios based on the Data Quality Framework: "Completeness" (the extent to which data elements are present and sufficient, e.g., "VALIDATION_TAXONID_EMPTY"); "Conformance" (Conforms to a format, syntax, type, range, standard or to the own nature of the information element, e.g., "VALIDATION_YEAR_NOTSTANDARD"); "Consistency" (Agreement among related information elements in the data, e.g., "VALIDATION_EVENTDATE_INCONSISTENT"); "Likeliness" (low probability that values are real, e.g., "VALIDATION_COORDINATES_ZERO"); "Resolution" (Is sufficient detail present in the value/s - a measure the granularity of the data, e.g., "VALIDATION_DATAGENERALISATIONS_NOTEMPTY")
Term-Actions	The part of the Label that specifies the focus Darwin Core term and the action applied to it
Warning Type	Warning assertion resulting from running a test, one of: Ambiguous, Amended, Incomplete, Inconsistent , Invalid, Notification, Report, Unlikely.
Parameters(s)	If there are options for bdq:sourceAuthority or values used in the test, they are specified here
Example	A concise example of the application of the test, e.g., dwc:taxonRecord="sp." becomes dwc:taxonRank="Species"
Source	The origin of the concept of the test, e.g., TDWG2018
References	One or more publications that relate directly to the test, e.g., http://rs.gbif.org/vocabulary/gbif/rank.xml
Example Implementations (Mechanisms)	A link to one or more agencies that have an implementation of the test, e.g., #86: "Kurator:event_date_qc"
Link to Specification Source Code	A link to reference code set that demonstrates the test, e.g., #86: https://github.com/FilteredPush/event_date_qc/blob/5f2e7b30f8a8076977b2a609e0318068db80599a/src/main/java/org/filteredpush/qc/date/DwCEventDQ.java#L169 A minimum set of unit tests is at: https://github.com/FilteredPush/event_date_qc/blob/5f2e7b30f8a8076977b2a609e0318068db80599a/src/test/java/org/filteredpush/qc/date/DwcEventDQTest.java#L310 see also unit tests for underlying implementation at https://github.com/FilteredPush/event_date_qc/blob/5f2e7b30f8a8076977b2a609e0318068db80599a/src/test/java/org/filteredpush/qc/date/DateUtilsTest.java#L460 and https://github.com/FilteredPush/event_date_qc/blob/5f2e7b30f8a8076977b2a609e0318068db80599a/src/test/java/org/filteredpush/qc/date/DateUtilsTest.java#L616
Notes	Additional comments that TG2 believed necessary for an accurate understanding of the test or issues that implementers needed to be aware of, e.g., The Taxonomic Rank GBIF Vocabulary has an extensive list of Ranks including synonyms in a number of languages.

ArthurChapman commented 6 years ago

Lee - I have altered ACTION to RESPONSE as earlier discussed

ArthurChapman commented 6 years ago

Note @Tasilee - we have just "Description" in Amendments and Measures - not a "Pass Description" and a "Fail Description"

Tasilee commented 6 years ago

I realised that we didn't have specifics of the subset of terms used within the TG2 test issues. This is a placeholder. In doing this draft, inconsistencies in the labels and with the TG1 Framework became apparent.

For example, some but not all Darwin Core terms are hyphenated in the Term component of the labels. e.g., #55, #56, #68 and #107 have been edited for consistency. There are however, also inconsistencies with those tests that have the (now called) "Response" value "FROM_XXXYYY" where "XXXYYY" can be one or more Darwin Core terms. I suggest for consistency, we use "FROM-XXX-YYY"?

Regards TG1-TG2, there is no "Likeliness" in the Framework DQ Dimension and some of the definitions are more global than the TG2 context. Please edit the table above if you think there are better definitions and/or examples.

Tasilee commented 6 years ago

@ArthurChapman regards "Pass description"/"Fail description", yep, I know. This was just a draft that I needed to Save before it got too complex. I also need to add an example "Example implementation" and "Link to specification code": They were hard to find - so we may need a few more TAGs that designate "Done X"

chicoreus commented 6 years ago

@Tasilee great to have definitions of these.

Also see the (trivial, special purpose) code (at https://github.com/kurator-org/bdq_issue_to_csv/blob/master/src/main/java/org/kurator/issueconverter/BDQConvert.java for manipulating these tables in the github issues into CSV suitable for alignment with the RDF representation of the framework.
https://github.com/kurator-org/kurator-ffdq/blob/master/competencyquestions/rdf/ffdq.owl Note that any change to the fields listed here in the issues will require corresponding edits to the code, and that the alignment to the framework should also inform these definitions.

Note that InformationElement is singular in the framework, even though an InformationElement may be comprised of a list of DarwinCore terms. DarwinCore class is in effect a category for the InformationElement.

It is probably worth separating the examples into a separate column, and adding a column to map these values to framework concepts (e.g. Information Elements = InformationElement, Data Quality Dimension = Dimension).

chicoreus commented 6 years ago

What we probably need, rather than (or in addition to) this, is the definitions of the column headers in the spreadsheet produced by the BDQConvert code (as that is bringing the tests into closer alignment with the framework).

ArthurChapman commented 6 years ago

@chicoreus - again we seem to have a problem with Framework Definitions and having multiple documents with conflicting information. the Glossary at https://tdwg.github.io/bdq/tg1/site/glossary.html which I and others thought was the current up-to-date version has DQ Dimension not just Dimension and Information Element not InformationElement

tucotuco commented 6 years ago

I am really glad to see this coming together. Reviewing it keeps us on our toes. It made me ponder a number of issues when I think about taking the information provided and turning it into code. I made an issue https://github.com/tdwg/bdq/issues/175 to discuss the merits of defining expected responses for the tests.

Tasilee commented 2 years ago

In discussion with @ArthurChapman this morning, I decided to update the specifications above to match the current implementation. The terms and definitions should also conform to the Vocabulary #152.

tdwg / bdq

TG2 - Test Specifications and definitions #174