tdwg / bdq

Biodiversity Data Quality (BDQ) Interest Group
https://github.com/tdwg/bdq
43 stars 7 forks source link

Generate MultiRecord Measures for QualityAssurance that return COMPLETE/NOT_COMPLETE for compliant #295

Open chicoreus opened 1 month ago

chicoreus commented 1 month ago

For a set of Validations, listed below, generate a Measure that operates on a MultiRecord and returns a Response.result of COMPLETE if all records in the MultiRecord have a Response.result of COMPLIANT for a particular Validation.

Template for these would look like this, one for each Validation, specified as {Validation}: Generate from the bdqffdq compliant template instead, to match TG2_tests.csv columns. We don't need to add issues for each, but can track rationale management for this set of tests here.

These Measures expect the Validations to complete to Response.status=RunHasResult and Response.result=COMPLIANT for data to have quality. See #297 for QA measures where some of the prerequisite information elements may be empty and the data still have quality.

TestField Value
GUID Generate for each.
Label MULTIRECORD_MEASUREQA{Validation.Term-Actions}
Description Measure if all {Validation} in a record set are COMPLIANT
TestType Measure
Darwin Core Class {Validation.Darwin Core Class}
Information Elements ActedUpon {Validation}.Response
Information Elements Consulted
Expected Response COMPLETE if every {Validation} in the MultiRecord has Response.result=COMPLIANT, otherwise NOT_COMPLETE.
Data Quality Dimension {Validation.Dimension}
Term-Actions {Validation.Term-Actions}
Parameter(s)
Source Authority
Specification Last Updated Generate
Examples
Source TG2
References
  • Veiga AK, Saraiva AM, Chapman AD, Morris PJ, Gendreau C, Schigel D, & Robertson TJ (2017) A conceptual framework for quality assessment and management of biodiversity data. PLOS ONE 12 (6): e0178731. https://doi.org/10.1371/journal.pone.0178731
Example Implementations (Mechanisms)
Link to Specification Source Code
Notes For Quality Assurance, filter record set until this measure is COMPLETE.
chicoreus commented 1 month ago

Added label template to allow code that translates markdown tables in issues to csv to exclude this and #296 by filtering on labels: CORE and Test and not Template.

chicoreus commented 1 month ago

CSV list of Core MultiRecord Measure tests generated with kurator-org/bdq_issue_to_csv including assignment of stable guids (list of test labels and guids in https://github.com/kurator-org/bdq_issue_to_csv/blob/master/src/main/resources/multirecord_measure_guids.csv, we can extend this file to mark tests that should accept INTERNAL_PREREQUISIITES_NOT_MET as COMPLIANT.

CSV file of tests:

https://github.com/tdwg/bdq/blob/master/tg2/core/TG2_multirecord_measure_tests.csv

Human readable markdown:

https://github.com/tdwg/bdq/blob/master/tg2/core/generation/docs/core_multirecord_measure_tests.md

ArthurChapman commented 1 month ago

As suggested by @Tasilee - perhaps a new issue for the exceptions where INTERNAL_PREREQUISITES_NOT_MET other wise the Expected Response is not correct fro those records.

ArthurChapman commented 1 month ago

I have corrected FFDQ in the preamble to bdqffdq for consistency with Vocabulary (#152)

chicoreus commented 4 weeks ago

Per @Tasilee and @ArthurChapman , splitting off a separate issue #297 for measures that allow for empty values and still are complete.

chicoreus commented 4 weeks ago

Per @Tasilee a list of these tests:

Label GUID
MULTIRECORD_MEASURE_QA_BASISOFRECORD_NOTEMPTY c8c61535-ab1a-4ec6-b4e9-f5f02541d7d8
MULTIRECORD_MEASURE_QA_BASISOFRECORD_STANDARD 241a279c-76d5-499b-ab49-a47ad7f8df50
MULTIRECORD_MEASURE_QA_CLASSIFICATION_CONSISTENT a2be4734-0a93-46dc-af4a-e2125b47dbd4
MULTIRECORD_MEASURE_QA_COORDINATES_NOTZERO 151b2d29-3460-4ba5-a226-86971dc8ad03
MULTIRECORD_MEASURE_QA_COUNTRY_FOUND 388e74b3-2e18-4d78-8112-3142d1177e25
MULTIRECORD_MEASURE_QA_COUNTRY_NOTEMPTY 9c8df974-8fba-4537-8c67-31466787f732
MULTIRECORD_MEASURE_QA_COUNTRYCODE_NOTEMPTY 942f63bd-d19d-4214-bf8e-cec0055b8909
MULTIRECORD_MEASURE_QA_COUNTRYSTATEPROVINCE_CONSISTENT b8063832-daa9-446b-a676-bca92cd0dd24
MULTIRECORD_MEASURE_QA_COUNTRYSTATEPROVINCE_UNAMBIGUOUS 23aced89-d613-479c-bc4c-837d74b73be0
MULTIRECORD_MEASURE_QA_DATEIDENTIFIED_INRANGE 6354376c-0cf2-435b-be40-850769c5a18a
MULTIRECORD_MEASURE_QA_DATEIDENTIFIED_STANDARD 563872eb-f544-45a0-8f91-8098d62768d4
MULTIRECORD_MEASURE_QA_DCTYPE_NOTEMPTY 4d999a65-a431-4a76-b591-e0d86dcf244b
MULTIRECORD_MEASURE_QA_DCTYPE_STANDARD d9493fa0-d90e-41db-95f6-d1c1d243540e
MULTIRECORD_MEASURE_QA_DECIMALLATITUDE_INRANGE 3c8bc478-f6b2-4533-b7ce-45bae5d186c2
MULTIRECORD_MEASURE_QA_DECIMALLATITUDE_NOTEMPTY a2535b23-4407-40bd-b23b-30c8185d72a2
MULTIRECORD_MEASURE_QA_DECIMALLONGITUDE_INRANGE 6f7a9b82-7d34-4111-a2a6-9efe5221fa44
MULTIRECORD_MEASURE_QA_DECIMALLONGITUDE_NOTEMPTY a94e986e-dbc8-4147-872d-5f2727945654
MULTIRECORD_MEASURE_QA_EVENT_CONSISTENT f375a3fd-4cf5-4ef4-955e-d71762ede2d8
MULTIRECORD_MEASURE_QA_EVENT_TEMPORAL_NOTEMPTY 215ea7b3-e52e-4c50-a5ac-86b8253c95cb
MULTIRECORD_MEASURE_QA_EVENTDATE_INRANGE d41a731b-2e2b-4442-9217-4c375ae92926
MULTIRECORD_MEASURE_QA_EVENTDATE_NOTEMPTY c23cd67d-1b5c-4e9f-a1ce-8cc6b3e9b365
MULTIRECORD_MEASURE_QA_EVENTDATE_STANDARD 14a1d51f-16ed-4148-9dc8-1e90157a9868
MULTIRECORD_MEASURE_QA_GEODETICDATUM_NOTEMPTY 488c1dff-21ec-4e68-a00a-7355505e180c
MULTIRECORD_MEASURE_QA_GEODETICDATUM_STANDARD cb88b6d9-85b2-4cd5-9bfa-c0d96f79552e
MULTIRECORD_MEASURE_QA_KINGDOM_FOUND 465d7ac1-d193-46c0-a302-56a9ef99215f
MULTIRECORD_MEASURE_QA_KINGDOM_NOTEMPTY 3bc9df8b-0f57-4157-9374-b56a99090b22
MULTIRECORD_MEASURE_QA_LICENSE_NOTEMPTY 4fccf163-9336-4f48-996c-57f5f66e72db
MULTIRECORD_MEASURE_QA_LICENSE_STANDARD acd8d43e-7a2a-4372-b887-fb53a9972dc9
MULTIRECORD_MEASURE_QA_LOCATION_NOTEMPTY 3b2e4791-1a5a-4087-9e8d-09c67cf8c816
MULTIRECORD_MEASURE_QA_OCCURRENCEID_NOTEMPTY 0028ef9a-6553-467b-a344-90327ed2babf
MULTIRECORD_MEASURE_QA_OCCURRENCESTATUS_NOTEMPTY d2922585-2070-4851-a033-15e51977f9dc
MULTIRECORD_MEASURE_QA_OCCURRENCESTATUS_STANDARD 2fea4571-92d0-48a5-a5ba-6caecd647862
MULTIRECORD_MEASURE_QA_SCIENTIFICNAME_FOUND a8aee02c-cf7c-4104-a601-d8afc4f9cbe2
MULTIRECORD_MEASURE_QA_SCIENTIFICNAME_NOTEMPTY b4d6a61c-64ff-4da0-974c-63a73fd20836
MULTIRECORD_MEASURE_QA_SCIENTIFICNAMEAUTHORSHIP_NOTEMPTY 6dd6fecf-6ba1-425c-afbe-6a9ed7b65ed7
MULTIRECORD_MEASURE_QA_SCIENTIFICNAMEID_COMPLETE a9529e71-5470-4cb1-b04d-aa483926f532
MULTIRECORD_MEASURE_QA_SCIENTIFICNAMEID_NOTEMPTY 4cf84216-c8a7-4865-a8e1-3ffd829d5a10
MULTIRECORD_MEASURE_QA_TAXON_NOTEMPTY 2a9d4cfd-815a-46e0-bb51-60724582b762
MULTIRECORD_MEASURE_QA_TAXON_UNAMBIGUOUS 0df03601-3768-4805-906a-bbd0a41b0fda
MULTIRECORD_MEASURE_QA_TAXONRANK_NOTEMPTY e0b8cff1-3322-40d2-b8b2-b99fc9ae130a
MULTIRECORD_MEASURE_QA_TAXONRANK_STANDARD f320ca83-8487-4011-b1ff-f4b1b4dd86ec
MULTIRECORD_MEASURE_QA_TYPESTATUS_STANDARD 1ca359ea-4df3-4dca-b92b-2bc8fa8e0c88
MULTIRECORD_MEASURE_QA_YEAR_INRANGE a0502c5f-608b-4e59-99da-d9490bb4d93b
MULTIRECORD_MEASURE_QA_YEAR_NOTEMPTY a8fef8a8-e7c7-4a2d-adaf-7da99c896c93