tdwg / bdq

Biodiversity Data Quality (BDQ) Interest Group
https://github.com/tdwg/bdq
43 stars 7 forks source link

TG2-VALIDATION_TYPESTATUS_NOTEMPTY #246

Closed Tasilee closed 9 months ago

Tasilee commented 9 months ago
TestField Value
GUID cd7cae15-f255-41a3-b002-c9620c40f620
Label VALIDATION_TYPESTATUS_NOTEMPTY
Description Is there a value in dwc:typeStatus?
TestType Validation
Darwin Core Class Identification
Information Elements ActedUpon dwc:typeStatus
Information Elements Consulted
Expected Response COMPLIANT if dwc:typeStatus is bdq:NotEmpty; otherwise NOT_COMPLIANT
Data Quality Dimension Completeness
Term-Actions TYPESTATUS_NOTEMPTY
Parameter(s)
Source Authority
Specification Last Updated 2024-02-04
Examples [dwc:typeStatus="holotype of Pinus radiata": Response.status=RUN_HAS_RESULT, Response.result=COMPLIANT, Response.comment="dwc:typeStatus is bdq:NotEmpty"]
[dwc:typeStatus="": Response.status=RUN_HAS_RESULT, Response.result=NOT_COMPLIANT, Response.comment="dwc:typeStatus is bdq:Empty"]
Source TG2
References
Example Implementations (Mechanisms) Kurator/FilteredPush sci_name_qc Library
Link to Specification Source Code https://github.com/FilteredPush/sci_name_qc/blob/v1.1.2/src/main/java/org/filteredpush/qc/sciname/DwCSciNameDQ.java#L3168
Notes The vast majority of current biodiversity data will be expected to not have a value in dwc:typeStatus. This test could have value in determining 'quality' for a specific set of data quality needs/use cases.
chicoreus commented 9 months ago

Unless we are hypersplitting and describing every new individual as a new taxon, complete with a type specimen, the value of dwc:typeStatus would be expected to be empty for almost all records in most data sets.

For most purposes, this would be better represented as a MultiRecord measure, counting the portion of records in a data set that have some value in dwc:typeStatus.

Tasilee commented 9 months ago

@chicoreus: Agreed, the Supplementary criteria "likely to return a high percentage of either bdq:COMPLIANT or bdq:NOT_COMPLIANT results (cf bdq:Response.result)" seems justified as 0.7% GBIF records have a value.