tdwg / bdq

Biodiversity Data Quality (BDQ) Interest Group
https://github.com/tdwg/bdq
43 stars 7 forks source link

Generate MultiRecord Measures for QualityControl that count validations which are COMPLIANT #296

Open chicoreus opened 1 month ago

chicoreus commented 1 month ago

For each Validation, generate a Measure that operates on a MultiRecord and returns a Response.result counting the number of SingleRecords where a particular Validation has a Response.result of COMPLIANT.

Run in a pre-amendment phase and again in a post-amendment phase, these Measures would measure how much the quality of the MultiRecord would be improved by accepting all of the proposed Amendments.

Template for these would look like this, one for each Validation, specified as {Validation}: Generate from the bdqffdq compliant template instead, to match TG2_tests.csv columns. We don't need to add issues for each, but can track rationale management for this set of tests here.

TestField Value
GUID Generate for each.
Label MULTIRECORD_MEASURE_COUNTCOMPLIANT{Validation.Term-Actions}
Description Count the number of {Validation} in a record set that are COMPLIANT
TestType Measure
Darwin Core Class {Validation.Darwin Core Class}
Information Elements ActedUpon {Validation}.Response
Information Elements Consulted
Expected Response Count the number of {Validation} in the MultiRecord that have Response.result=COMPLIANT.
Data Quality Dimension {Validation.Dimension}
Term-Actions {Validation.Term-Actions}
Parameter(s)
Source Authority
Specification Last Updated Generate
Examples
Source TG2
References
  • Veiga AK, Saraiva AM, Chapman AD, Morris PJ, Gendreau C, Schigel D, & Robertson TJ (2017) A conceptual framework for quality assessment and management of biodiversity data. PLOS ONE 12 (6): e0178731. https://doi.org/10.1371/journal.pone.0178731
Example Implementations (Mechanisms)
Link to Specification Source Code
Notes For Quality Control, compare the Response.result of this measure with the total number of records to assess work needed on the record set.
chicoreus commented 1 month ago

Per @ArthurChapman correcting the Description. Also putting placeholder for Validation into the expected response.

chicoreus commented 1 month ago

CSV list of Core MultiRecord Measure tests generated with kurator-org/bdq_issue_to_csv including assignment of stable guids (list of test labels and guids in https://github.com/kurator-org/bdq_issue_to_csv/blob/master/src/main/resources/multirecord_measure_guids.csv, we can extend this file to mark tests that should accept INTERNAL_PREREQUISIITES_NOT_MET as COMPLIANT.

CSV file of tests:

https://github.com/tdwg/bdq/blob/master/tg2/core/TG2_multirecord_measure_tests.csv

Human readable markdown:

https://github.com/tdwg/bdq/blob/master/tg2/core/generation/docs/core_multirecord_measure_tests.md

ArthurChapman commented 1 month ago

I have corrected FFDQ in the preamble to bdqffdq for consistency with Vocabulary (#152)

chicoreus commented 4 weeks ago

List of these tests:

Label GUID
MULTIRECORD_MEASURE_COUNT_COMPLIANT_BASISOFRECORD_NOTEMPTY b60c8c58-0137-4b6a-97e9-57d8ca5cf248
MULTIRECORD_MEASURE_COUNT_COMPLIANT_BASISOFRECORD_STANDARD f5dd74bd-6a22-4792-b675-c7ccf2a2c103
MULTIRECORD_MEASURE_COUNT_COMPLIANT_CLASS_FOUND 7270a362-5f2e-41f0-955a-d7a8eaaf0f17
MULTIRECORD_MEASURE_COUNT_COMPLIANT_CLASSIFICATION_CONSISTENT a56fb342-c8ee-4611-8aab-e6c6357e8875
MULTIRECORD_MEASURE_COUNT_COMPLIANT_COORDINATES_COUNTRYCODE_CONSISTENT c44ce101-fb76-4948-a4f3-14c6dc5fee4a
MULTIRECORD_MEASURE_COUNT_COMPLIANT_COORDINATES_NOTZERO 0e239a55-0f19-4c68-bdbf-20580f27a647
MULTIRECORD_MEASURE_COUNT_COMPLIANT_COORDINATES_STATEPROVINCE_CONSISTENT 47d83e78-20fa-4da1-a867-4e93c7161f0d
MULTIRECORD_MEASURE_COUNT_COMPLIANT_COORDINATES_TERRESTRIALMARINE 10c84d1f-69b9-4321-a5a8-58a582e71fbc
MULTIRECORD_MEASURE_COUNT_COMPLIANT_COORDINATEUNCERTAINTY_INRANGE 2d90d94b-d384-4bac-aa68-c6800d765882
MULTIRECORD_MEASURE_COUNT_COMPLIANT_COUNTRY_COUNTRYCODE_CONSISTENT d197716f-6556-4010-822c-252479b17c1a
MULTIRECORD_MEASURE_COUNT_COMPLIANT_COUNTRY_FOUND f15c38c3-d96d-4e9c-982d-410fb71cf2bc
MULTIRECORD_MEASURE_COUNT_COMPLIANT_COUNTRY_NOTEMPTY 6887c881-dc52-409b-8979-9c2f05e55569
MULTIRECORD_MEASURE_COUNT_COMPLIANT_COUNTRYCODE_NOTEMPTY d71be8d4-1a04-4816-90c5-49808c823651
MULTIRECORD_MEASURE_COUNT_COMPLIANT_COUNTRYCODE_STANDARD 38966850-3737-4a67-953c-c231469e0489
MULTIRECORD_MEASURE_COUNT_COMPLIANT_COUNTRYSTATEPROVINCE_CONSISTENT c31f6820-ef6b-427b-84b7-68545ea64324
MULTIRECORD_MEASURE_COUNT_COMPLIANT_COUNTRYSTATEPROVINCE_UNAMBIGUOUS 8b73f37d-89ed-479a-8389-9e71ad2ac84d
MULTIRECORD_MEASURE_COUNT_COMPLIANT_DATEIDENTIFIED_INRANGE c72fda2d-16e1-4ded-91a5-a7094339d603
MULTIRECORD_MEASURE_COUNT_COMPLIANT_DATEIDENTIFIED_STANDARD 49b787eb-7dce-4ace-97f5-7cbb47cd8cb9
MULTIRECORD_MEASURE_COUNT_COMPLIANT_DAY_INRANGE 780480ff-8c4a-4276-aaca-cbd1248de9fa
MULTIRECORD_MEASURE_COUNT_COMPLIANT_DAY_STANDARD c3e0100f-dfc3-4379-a855-df878eef295e
MULTIRECORD_MEASURE_COUNT_COMPLIANT_DCTYPE_NOTEMPTY f041ab17-d834-4586-aa6b-090de2e571a8
MULTIRECORD_MEASURE_COUNT_COMPLIANT_DCTYPE_STANDARD fbe47441-500f-44c7-a1bd-1e872edc5266
MULTIRECORD_MEASURE_COUNT_COMPLIANT_DECIMALLATITUDE_INRANGE f0fb1c79-9e3d-4d6c-a5a9-087cf57ebd26
MULTIRECORD_MEASURE_COUNT_COMPLIANT_DECIMALLATITUDE_NOTEMPTY bceae35a-52ab-4968-846f-069ace766513
MULTIRECORD_MEASURE_COUNT_COMPLIANT_DECIMALLONGITUDE_INRANGE c70c4950-2246-4acc-a59d-81eaa47edf2b
MULTIRECORD_MEASURE_COUNT_COMPLIANT_DECIMALLONGITUDE_NOTEMPTY f948a30e-8084-48d5-b1ca-d77c476f181f
MULTIRECORD_MEASURE_COUNT_COMPLIANT_DEGREEOFESTABLISHMENT_STANDARD 1b8ae68e-63f1-41c0-9025-fbe64db38d64
MULTIRECORD_MEASURE_COUNT_COMPLIANT_ENDDAYOFYEAR_INRANGE 7775309b-5331-4a65-b839-cbe959948d33
MULTIRECORD_MEASURE_COUNT_COMPLIANT_ESTABLISHMENTMEANS_STANDARD 130bb875-6b7c-4965-b864-d53f9d05b2cd
MULTIRECORD_MEASURE_COUNT_COMPLIANT_EVENT_CONSISTENT 1919f212-b7db-4b6e-9697-41a715001bd6
MULTIRECORD_MEASURE_COUNT_COMPLIANT_EVENT_TEMPORAL_NOTEMPTY 0adce26e-996b-4ee6-b3df-1368103462b3
MULTIRECORD_MEASURE_COUNT_COMPLIANT_EVENTDATE_INRANGE c8250600-de61-4047-9d7c-6e06a38c7994
MULTIRECORD_MEASURE_COUNT_COMPLIANT_EVENTDATE_NOTEMPTY 3f62eaa2-747f-456b-85e6-1a6e74086a18
MULTIRECORD_MEASURE_COUNT_COMPLIANT_EVENTDATE_STANDARD bffd7499-aca3-423f-bb43-f7bdc9260f4f
MULTIRECORD_MEASURE_COUNT_COMPLIANT_FAMILY_FOUND 97928242-11a9-4ab0-9dd7-3f0465834824
MULTIRECORD_MEASURE_COUNT_COMPLIANT_GENUS_FOUND 977f7e75-a88e-4e29-a7b1-e8cdd35aa107
MULTIRECORD_MEASURE_COUNT_COMPLIANT_GEODETICDATUM_NOTEMPTY 63fbaf3c-3d41-4ab6-bfc0-904f1b26835f
MULTIRECORD_MEASURE_COUNT_COMPLIANT_GEODETICDATUM_STANDARD 8d8aba5c-f58a-49c9-a08d-1afb5ff1aa63
MULTIRECORD_MEASURE_COUNT_COMPLIANT_KINGDOM_FOUND 012eade5-fc64-458a-a13a-a614491bf4e0
MULTIRECORD_MEASURE_COUNT_COMPLIANT_KINGDOM_NOTEMPTY 342bd81c-e2b7-41d8-b32b-013992d19f99
MULTIRECORD_MEASURE_COUNT_COMPLIANT_LICENSE_NOTEMPTY 47ee20d9-5087-4f76-a494-6fea05e50b8b
MULTIRECORD_MEASURE_COUNT_COMPLIANT_LICENSE_STANDARD 9d5be694-f5da-465d-8c7e-27e6dac69f9f
MULTIRECORD_MEASURE_COUNT_COMPLIANT_LOCATION_NOTEMPTY bac852b5-1ba6-427b-aa8e-02018e99279c
MULTIRECORD_MEASURE_COUNT_COMPLIANT_MAXDEPTH_INRANGE 3de8af03-05d4-4fd8-8e6d-ba886dc5446f
MULTIRECORD_MEASURE_COUNT_COMPLIANT_MAXELEVATION_INRANGE 6a3baf78-5ec3-4a84-8c6b-6876b3a4e3b5
MULTIRECORD_MEASURE_COUNT_COMPLIANT_MINDEPTH_INRANGE 9c66c116-6644-45b4-b4c7-db7a4ee7c500
MULTIRECORD_MEASURE_COUNT_COMPLIANT_MINDEPTH_LESSTHAN_MAXDEPTH b21256c2-4bb7-4deb-852d-a9eaa05345e7
MULTIRECORD_MEASURE_COUNT_COMPLIANT_MINELEVATION_INRANGE 071267a0-d993-4961-a3f7-d8210810d167
MULTIRECORD_MEASURE_COUNT_COMPLIANT_MINELEVATION_LESSTHAN_MAXELEVATION be2eb717-b390-47d1-b7ba-965a1101e215
MULTIRECORD_MEASURE_COUNT_COMPLIANT_MONTH_STANDARD c3b4cd93-a37f-4a0a-89dd-7ce53669f1f3
MULTIRECORD_MEASURE_COUNT_COMPLIANT_NAMEPUBLISHEDINYEAR_NOTEMPTY 36ea0a78-a079-4014-aca3-2f2b3b11e822
MULTIRECORD_MEASURE_COUNT_COMPLIANT_OCCURRENCEID_NOTEMPTY 0c9a139e-5d23-44de-a495-14ec08c61a1c
MULTIRECORD_MEASURE_COUNT_COMPLIANT_OCCURRENCESTATUS_NOTEMPTY 298db0c9-a85a-41ee-b111-d622fd969d71
MULTIRECORD_MEASURE_COUNT_COMPLIANT_OCCURRENCESTATUS_STANDARD faca6558-dbff-4d26-a5cb-e11cdf632fe7
MULTIRECORD_MEASURE_COUNT_COMPLIANT_ORDER_FOUND f4fa449c-4b74-4dcf-b4cf-0b73e1496375
MULTIRECORD_MEASURE_COUNT_COMPLIANT_PATHWAY_STANDARD 15e0da1d-1a43-4075-8454-b2e618cdd25b
MULTIRECORD_MEASURE_COUNT_COMPLIANT_PHYLUM_FOUND 65e66ca0-e9d1-4411-ad26-bb9c43f32afc
MULTIRECORD_MEASURE_COUNT_COMPLIANT_POLYNOMIAL_CONSISTENT 7da5428e-87b2-4ec2-be82-05b9398b7577
MULTIRECORD_MEASURE_COUNT_COMPLIANT_SCIENTIFICNAME_FOUND 4e70b0e4-3fd2-4899-802c-439671374eeb
MULTIRECORD_MEASURE_COUNT_COMPLIANT_SCIENTIFICNAME_NOTEMPTY 0f8b30e2-59dc-46ba-8b91-62049cd1a4e2
MULTIRECORD_MEASURE_COUNT_COMPLIANT_SCIENTIFICNAMEAUTHORSHIP_NOTEMPTY dbf3cece-1d83-426e-a5e0-8158fcf9c0cd
MULTIRECORD_MEASURE_COUNT_COMPLIANT_SCIENTIFICNAMEID_COMPLETE f174ad13-3c67-49f9-8d8b-aba4e933d6f6
MULTIRECORD_MEASURE_COUNT_COMPLIANT_SCIENTIFICNAMEID_NOTEMPTY a9962d33-ad08-453a-8938-2972425034c2
MULTIRECORD_MEASURE_COUNT_COMPLIANT_SEX_STANDARD e4d35063-2366-4dda-8eaa-326340361da3
MULTIRECORD_MEASURE_COUNT_COMPLIANT_STARTDAYOFYEAR_INRANGE 76008c6b-c56a-4233-84e3-8584950037ec
MULTIRECORD_MEASURE_COUNT_COMPLIANT_STATEPROVINCE_FOUND 58fdd5c1-6201-49a1-a7cd-f49c210dc0b6
MULTIRECORD_MEASURE_COUNT_COMPLIANT_TAXON_NOTEMPTY 54d290e8-ac48-4f31-8af3-676363573217
MULTIRECORD_MEASURE_COUNT_COMPLIANT_TAXON_UNAMBIGUOUS 782773c9-7b37-483d-8ce2-c6683ba81882
MULTIRECORD_MEASURE_COUNT_COMPLIANT_TAXONRANK_NOTEMPTY de661615-b338-4557-af5b-d44a89de40fa
MULTIRECORD_MEASURE_COUNT_COMPLIANT_TAXONRANK_STANDARD 602bc457-6b1d-4f24-adef-d0d31bcdf2f0
MULTIRECORD_MEASURE_COUNT_COMPLIANT_TYPESTATUS_STANDARD b5a14d76-292e-499b-b80f-9546243311a0
MULTIRECORD_MEASURE_COUNT_COMPLIANT_YEAR_INRANGE aee65eb8-8d1e-4b8f-bd37-5822e29f5734
MULTIRECORD_MEASURE_COUNT_COMPLIANT_YEAR_NOTEMPTY 687d3ad1-93a3-4a1f-b69f-da5a1eb761a5