TG2-Responses from tests

Tasilee commented 6 years ago

We need to fully elucidate the options available for the allowed values and status of running each of the 'tests'-

VALIDATION (COMPLIANT, NOTCOMPLIANT, INTERNAL_PREREQUISITES_NOT_MET, EXTERNAL_PREREQUISITES_NOT_MET...

AMDENDMENT (PREREQUISITESNOTMET, RUN, AMBIGUOUS...

MEASURE (VALUE...

NOTIFICATION (TEXT...

chicoreus commented 6 years ago

Here's the current position based on discussions with Allan and implementation in Kurator (see the FFDQ-API project https://github.com/kurator-org/ffdq-api/tree/master/src/main/java/org/datakurator/ffdq/api)

All Tests return a Response consisting of Result, Status and Comment.

For all tests, Comment is a human readable explanation of the test response.

For all tests, possible values of Status are: NOT_RUN, AMBIGUOUS, INTERNAL_PREREQUISITES_NOT_MET, EXTERNAL_PREREQUISITES_NOT_MET, EXTERNAL_PREREQUISITES_NOT_MET, RUN_HAS_RESULT.

For AMENDMENTS, additional possible values of Status are: TRANSPOSED, CHANGED, FILLED_IN, NO_CHANGE..

VALIDATION Result values are only (COMPLIANT, NOT_COMPLIANT)

PROBLEM Result values are only (NOT_PROBLEM, PROBLEM)

MEASURE Result values ({numeric value}) (TG2 is proposing that no TG2 measures return COMPLETE or INCOMPLETE, which are allowed values under the framework.)

AMENDMENT Result consists of a data structure consisting of the terms for which changes are proposed and the proposed values for those changes, probably json with key value pairs where the key is the Darwin Core term and the value is the new replacement value proposed by the amendment.

NOTIFICATION is currently not considered in the framework and needs analysis by Allan.

Examples:

Validation Response { Result:COMPLIANT, status:RUN_HAS_RESULT, comment:"provided value x is compliant with test foo" }

Validation Response { Result:null, status:INTERNAL_PREREQUISITES_NOT_MET, comment:"No value provided for required element foo, unable to run test." }

Measurement Response { Result: 0.30, status: RUN_HAS_RESULT, comment:"30% of records comply with foo" }

Amendment Response { Result:{"dwc:year":"1830"}, status:FILLED_IN, comment:"empty year filled in as 1830 from provided value '1830' in eventDate." }

Amendment Response { Result:{"dwc:day":"2"}, status:CHANGED, comment:" provided value of day '2nd' standardized to '2'." }

Amendment Response { Result:{"dwc:decimalLatitude":"20","dwc:decimalLongitude":"190"}, status:TRANSPOSED, comment:" provided value of decimalLatitude 190 out of range, transposition of decimal latitude and decimal longitude from 190,20 to 20,190 results in a valid location which falls inside buffered boundaries for the provided country and stateProvince." }

chicoreus commented 6 years ago

INTERNAL_PREREQUISITES_NOT_MET could be called DATA_PREREQUSITES_NOT_MET, it indicates that required elements in the data under test are not available to run the test (e.g. no value is present in dwc:decimalLatitude for a test which compares the coordinate with the textual higher geography terms). It is expected that repeated runs of unchanged data which result in a response.status of INTERNAL_PREREQUSITIES_NOT_MET will always return that status.

The response status EXTERNAL_PREQUISITES_NOT_MET signals that some external system external to the test was unavailable at the time the test was run. For example, if a test evaluates some input data against an external service and that external service is down, the test cannot be run. It is expected that repeated runs of unchanged data which result in a response.status of EXTERNAL_PREREQUSITIES_NOT_MET may return a different response in the future (e.g. when the external service is back up).

ArthurChapman commented 6 years ago

I think one of things we are suggesting with Amendments is that you may wish to say amendment was not made because result would have been ambiguous. We may need to say we could have filled it in, but there were more than one option and thus there is an ambiguity and therefore nothing was changed. e.g. filling in date from verbatim where it said 02/03/1954

Tasilee commented 6 years ago

Reviewing all the current 'tests', the following is a minimal set of possible responses

Test Type	Possible reponses
VALIDATION	COMPLIANT, NOT_COMPLIANT, PREREQUISITES_NOTMET
AMENDMENT	RUN/APPLIED, AMBIGUOUS, PREREQUISITES_NOTMET
MEASURE	value
NOTIFICATION	COMPLIANT, value

chicoreus commented 6 years ago

@Tasilee That's too minimal a list. (1) Allan has made it clear that values must be separate from status, so a response needs both a value and a status (and we've seen in FilteredPush and Kurator a human readable explanation as well. (2) PREREQUISITES_NOT_MET is not sufficient to distinguish between cases where some external service was not available and running the test again later will return a different result and cases where the data to run the test are simply not present. This is a crucial distinction both for human interpretation of test results and for software to understand whether to repeat portions of an analysis later. (3) For markup of amendments, it is very valuable in rendering results for human consumption to be able to indicate the distinction between an empty value having been FILLED_IN and an existing value having been CHANGED (TRANSPOSED may be a non-minimal variant of CHANGED), and the amendment having successfully run but not having proposed a change (which in most renderings will probably mean simply not showing that amendment). I think the list I give above is pretty close to the actual usable minimum. We've got a paper discussing this in much more detail in preparation for BISS from Kurator.

chicoreus commented 6 years ago

AMBIGUOUS may also be an orthogonal attribute (of any of the other tests), and could be an extension point for bringing in lists of several possible responses with their ordering or probabilities, but it could probably be modeled as a response status.

chicoreus commented 6 years ago

Tabulating and shortening a bit, I think this is the minimal list (we also need PROBLEM/ISSUE, as most portals wanted to represent validations as Problems rather than Validations (back to that question of positive (in the framework) versus negative (needs to be added to the framework) sense.

Test Type	Response.Result	Response.Status
VALIDATION	COMPLIANT, NOT_COMPLIANT	INTERNAL_PREREQUISITES_NOTMET, EXTERNAL_PREREQUISITES_NOT_MET, RUN_HAS_RESPONSE, NOT_RUN
ISSUE	HAS_PROBLEM, NOT_PROBLEM	INTERNAL_PREREQUISITES_NOTMET, EXTERNAL_PREREQUISITES_NOT_MET, RUN_HAS_RESPONSE, NOT_RUN
AMENDMENT	key:value list of proposed changes	FILLED_IN, CHANGED, AMBIGUOUS, INTERNAL_PREREQUISITES_NOTMET, EXTERNAL_PREREQUISITES_NOT_MET, NOT_RUN
MEASURE	value	INTERNAL_PREREQUISITES_NOTMET, EXTERNAL_PREREQUISITES_NOT_MET, RUN_HAS_RESPONSE, NOT_RUN
NOTIFICATION	value	RUN_HAS_RESPONSE

ArthurChapman commented 6 years ago

We keep coming back to the Negative versus Positive discussion. We really need to sort this out - hopefully we can find some time when everyone can be there in Dunedin.

As I see it: The CORE tests (run in the negative) are largely USER-INDEPENDANT tests and will be run by aggregators, data custodians, museums, herbaria. That includes our 65 or so VALIDATIONS.

Ideally, USERS should not have to worry about these IF it is done correctly and all aggregators come to the party. The users can then run the SUPPLEMENTAL (and other tests) looking for data that fits their criteria - many of those tests will require other parameters as stated in the Framework (i.e. for my purpose, I only want records where dwc:country="Australia" of where dwc:longitude is "between -5 and 44" etc. The CORE set of tests in the negative should never have additional parameters

So following my own thinking - USER tests would include:

SUPPLEMENTAL tests not included in our CORE set
CORE tests run as POSITIVE with additional parameters.

So I guess a workflow would include:

1. run CORE VALIDATION tests as NEGATIVE
    Annotate
2. run CORE AMENDMENT tests
    Annotate
 3. run CORE VALIDATION tests as NEGATIVE
    Annotate
4. run required subset of CORE VALIDATION tests as POSITIVE with additional parameters
5. run required subset of SUPPLEMENTAL tests as POSITIVE (SUPPLEMENTAL tests should not need the negative)

1-3 would be run by Aggregators and others looking for problems/errors with the data 4-5 would be run by USERS who are wanting to subset the data for their use.

I may be wrong - but I don't see amendments being made in steps 4 and 5 - at least not to the original data, because they are being run with a particular use in mind.

chicoreus commented 6 years ago

@ArthurChapman we did specify that some of the core tests (in particular, validation of scientific names against an authority) would take parameters (e.g. the national/aggregator's authority list to test against), so a few of the core tests are specified as taking parameters.

As for positive/negative, I think we've got it sorted out, just need to formalize that in the framework. All the tests we have been classifying as Validations are descriptions of a pair of tests, one Validation which returns, per the framework, the positive statement of data has quality for some use: COMPLIANT/NOT_COMPLIANT, and on Problem, which returns, in some way yet to be fully specified in the framework, the negative statement that some problem exists in the data (and some response value in the form PROBLEM/NOT_PROBLEM. We think that all of the validations symmetrically describe both a Validation and a Problem, and that the positive and negative descriptions of the tests can just be transposed between the two. We think that an implementer could implement and return mixed positive/negative results consisting of only Measures, Problems, and Amendments, or could stay wholly within the positive sense of the original framework and return Measures, Validations, and Amendments. This position lets us think about just validations s.l., but we still need to provide the symmetrical formalisms to let implementers actually assert Problems. I don't think any one run of a workflow would assert some mixture of Validations and Problems, but would likely make all of its validation s.l. assertions framed as one or the other.

Steps 4 and 5 only running after step 2 in your list above would not let a user see how much accepting the proposed amendments would improve the fittness of their data for their purpose (amendments might well affect the results of supplemental tests). The only way to see what difference the amendments make is to run all the measures and validations in a pre-amendment phase, repeat them in a post-amendment phase, and compare the results.

I do like your idea of explicitly framing a workflow where users can optionally add in the supplemental tests for their data quality use cases, but aggregators are focused on the core. In terms of the framework, that's a Data Quality Profile Data Quality Profile DQP (u) = {dqp | dqp = mp(u) ⋃ vp(u) ⋃ ip(u), mp ∈ MP , vp ∈ VP , ip ∈ IP ⋀ u ∈ U }

The framework does describe both Quality Control (looking for causes that make data not fit for purpose) and Quality Assurance (filtering to only records which have the desired quality). Interestingly, in looking this up I noticed that Quality Assurance (flitering/subsetting the data) is formally described as only including the validation tests (DQV), while quality control includes only validations and amendments (DQI), neither are formally described to include the measures (DQM): Quality Control QC(dr) = {dqv(dr) ⋃ dqi(dr) | dqv ∈ DQV , dqi ∈ DQI ⋀ dr ∈ DR} Quality Assurance QA(dr) = {dqv(dr) | dqv ∈ DQV ⋀ dr ∈ DR} (This, to my mind, provides evidence that we are on the right track in proposing limiting the response values for measures to just a value, and not including the binary COMPLETE/INCOMPLETE). In contrast, a Data Quality Profile is just what you've described, a combination of validations, measures, and amendments around some use case: Data Quality Profile DQP (u) = {dqp | dqp = mp(u) ⋃ vp(u) ⋃ ip(u), mp ∈ MP , vp ∈ VP , ip ∈ IP ⋀ u ∈ U }

ArthurChapman commented 6 years ago

It d be good to see how this may translate to an ANNOTATION using the W3C Annotations. Is that possible @chicoreus ? I think that may help the discussion and make it clearer for those that will need to impliment

lowery commented 6 years ago

Just to add to Paul's list, we've also talked about TRANSPOSED, NO_CHANGE for Amendments and UNABLE_CURATE for all assertion types.

Test Type	Response.Result	Response.Status
VALIDATION	_COMPLIANT, NOTCOMPLIANT	INTERNAL_PREREQUISITES_NOT_MET, EXTERNAL_PREREQUISITES_NOT_MET, _UNABLECURATE, RUN_HAS_RESULT, NOT_RUN
ISSUE	HAS_PROBLEM, NO_PROBLEM	INTERNAL_PREREQUISITES_NOT_MET, EXTERNAL_PREREQUISITES_NOT_MET, _UNABLECURATE, RUN_HAS_RESULT, NOT_RUN
AMENDMENT	key:value list of proposed changes	CHANGED, FILLED_IN, TRANSPOSED, NO_CHANGE, AMBIGUOUS, INTERNAL_PREREQUISITES_NOT_MET, EXTERNAL_PREREQUISITES_NOT_MET, _UNABLECURATE, NOT_RUN
MEASURE	value, _COMPLETE, NOTCOMPLETE	INTERNAL_PREREQUISITES_NOT_MET, EXTERNAL_PREREQUISITES_NOT_MET, _UNABLECURATE, RUN_HAS_RESULT, NOT_RUN
NOTIFICATION	value	RUN_HAS_RESULT

chicoreus commented 6 years ago

I've put into italics in @lowery's table above elements that are not expected to be implemented in the TG2 test suite. In particular, the expectation is that Issues will be implemented instead of Validations, and measures in the TG2 tests will be expected to only return values.

ArthurChapman commented 6 years ago

Not sure what you man by UNABLE-CURATE. And if I am not mistaken @chicoreus - this would not apply to any of the TG2 CORE tests?

allankv commented 6 years ago

Explanations of @chicoreus and the @lowery implementation of Kurator (https://github.com/kurator-org/ffdq-api/tree/master/src/main/java/org/datakurator/ffdq/api) look pretty neat to me and are compliant with the framework proposal.

Some comments:

In relation to the @lowery's table, I would include (or swap "NO_CHANGE" by) "RECOMMENDED", as a possible status of Amendment. Depending the implementation and context, I only want inform data owner or data user that there is an Amendment available, but only the owner can in fact change it. Actually, I'm implementing a data quality tool for a project which is in this context.
The "Notification" field are welcome to the model. It can be useful to inform data users or owners about some tips, warnings or other messages regarding data quality. Is that purpose of it?
In "Measure", I think the result should be just "value" (enabling insert a numerical or textual value), as mentioned by @chicoreus. But as "best practices", we could recommend the use of a vocabulary, always when it is applicable, based on usual data quality dimensions in the literature, such as [COMPLETE / PARTIAL_COMPLETE / NOT_COMPLETE], [CONSISTENT / PARTIAL_CONSISTENT / NOT_CONSISTENT], [CONFORM / PARTIAL_CONFORM / NOT_CONFORM], [ACCURATE / PARTIAL_ACCURATE / NOT_ACCURATE], [TIMELY / PARTIAL_TIMELY / NOT_TIMELY], [ACCESSIBLE / PARTIAL_ACCESSIBLE / NOT_ACCESSIBLE], etc (http://mitiq.mit.edu/Documents/Publications/TDQMpub/14_Beyond_Accuracy.pdf).

Tasilee commented 6 years ago

Apologies for the delay in commenting. I've been depressed over the complexity of the responses and status. I keep wondering how we make this easy to understand from a users' perspective.

Thanks @lowery for the table: makes it easier to comprehend. That said,

We have no class "ISSUE"
"MEASURE" returns only "value" (as @chicoreus and @allankv note)
"UNABLE_CURATE" doesn't make sense to me either (as @ArthurChapman and @chicoreus also note)
"AMENDMENT" respose is in the main "change" with potential "proposed change" on failure status
As @allankv notes on related, we do need a clear and concise suite of definitions for each of the agreed response and status terms. Is there a relevant Vocab or Ontology we can tap?
@allankv : "NOTIFICATION" is indeed intended as a warning or flag that should be noted by data producers and users.

Do we have consensus? Remaining issues? Dunedin strategy and priorities?

Tasilee commented 6 years ago

From @chicoreus, a minimalist foundation:

Value: Numeric for measures, a controlled vocabulary (e.g. COMPLIANT, NOT_COMPLIANT) for validations and a data structure (e.g. a list of key value pairs) for proposed changes in amendments
Status: Controlled vocabulary, metadata concerning the success, failure, or problems with the test.
Remark: Human readable text describing why the test result came out the way it did.

ArthurChapman commented 2 weeks ago

We have settled on a standardised way of doing this and it has been documented in Chapman AD, Belbin L, Zermoglio PF, Wieczorek J, Morris PJ, Nicholls M, Rees ER, Veiga AK, Thompson A, Saraiva AM, James SA, Gendreau C, Benson A, Schigel D (2020) Developing Standards for Improved Data Quality and for Selecting Fit for Use Biodiversity Data. Biodiversity Information Science and Standards 4: e50889. https://doi.org/10.3897/biss.4.50889. It will be further documented in the documents that will form of the BDQ Core.

I am now closing this issue

tdwg / bdq

TG2-Responses from tests #142