tdwg / bdq

Biodiversity Data Quality (BDQ) Interest Group
https://github.com/tdwg/bdq
43 stars 7 forks source link

TG2_VALIDATION_DISPOSITION_NOTEMPTY #225

Closed Tasilee closed 9 months ago

Tasilee commented 10 months ago
TestField Value
GUID b4c17611-2703-474f-b46a-93b08ecfee16
Label VALIDATION_DISPOSITION_NOTEMPTY
Description Is there a value in dwc:disposition?
TestType Validation
Darwin Core Class dwc:MaterialEntity
Information Elements ActedUpon dwc:disposition
Information Elements Consulted
Expected Response COMPLIANT if dwc:disposition is bdq:NotEmpty; otherwise NOT_COMPLIANT
Data Quality Dimension Completeness
Term-Actions DISPOSITION_NOTEMPTY
Parameter(s)
Source Authority
Specification Last Updated 2024-01-29
Examples [dwc:disposition="Missing": Response.status=RUN_HAS_RESULT, Response.result=COMPLIANT, Response.comment="dwc:disposition is bdq:NotEmpty"]
[dwc:disposition="": Response.status=RUN_HAS_RESULT, Response.result=NOT_COMPLIANT, Response.comment="dwc:disposition is bdq:Empty"]
Source TG2
References
Example Implementations (Mechanisms)
Link to Specification Source Code
Notes This bdq:Supplementary test is not regarded as CORE (cf. bdq:CORE) because of one or more of the reasons: not being widely applicable; not informative; not straightforward to implement or likely to return a high percentage of either bdq:COMPLIANT or bdq:NOT_COMPLIANT results (cf bdq:Response.result). A Supplementary test may be implemented as CORE when a suitable use case exists.
chicoreus commented 10 months ago

We should probably take a position on what the apropriate value is for an unknown disposition, and probably take the position that this should be the string literal "unknown".

ArthurChapman commented 10 months ago

@chicoreus - Would that not be a different test - not just a NOTEMPTY test

chicoreus commented 10 months ago

@ArthurChapman yes, except we need to state the expectations for unknown values, if disposition is unknown and dwc:disposition is expected to be reported as an empty value, then this test would have very limited value, it can't tell the difference between values that were not supplied and values that are known to be unknown by the collection. If the expectation is that unknown disposition is reported with a value (unknown, unchecked, etc.), then this test has utility for identifying where this data has not been supplied. So to frame this test, we should take a position on the expectation for handling of unknown values of disposition, this will affect how the test can be used.

ymgan commented 9 months ago

Maybe the following could be useful?

From the standards document:

We considered, and explicitly rejected, treating common string serializations of null such as \N and NULL as empty values. If "\N" is present in a data set, the tests will explicitly treat that value as NOTEMPTY, and then try to evaluate it against whatever other criteria apply. This definition is not applicable to a discussion of what value to include in a controlled vocabulary to indicate that no meaningful value is present, so no suggestion is made that "EMPTY" should be used as a data value to represent some form of "Null", "Unknown", "Not Recorded", etc. Choices there would fall into the semantics for some set of controlled vocabularies. The relevance to such a discussion is that this definition would treat an empty string as an empty value, with no semantics attached as to why the value is empty.

Tasilee commented 9 months ago

This one does seem anomalous as a potential "Supplementary", mainly because of it's tight association with dwc:MaterialEntity and dwc:basisOfRecord. I'm thinking leaving it as Supplementary or 'Do not implement' may be a red herring. Thoughts?

Oddly the only one who voted for it back when that tests worksheet of all considered tests was done was me, and I wouldn't repeat that now.

tucotuco commented 9 months ago

This one does seem anomalous as a potential "Supplementary", mainly because of it's tight association with dwc:MaterialEntity and dwc:basisOfRecord. I'm thinking leaving it as Supplementary or 'Do not implement' may be a red herring. Thoughts?

I don't understand why leaving it as supplementary would be a red herring. I think that is the correct label.

I think the question of "unknown" is a global one for terms recommending a controlled vocabulary, not specific in any way to disposition and does not require a a specific treatment for this test. In other words, the expectation is as for any term recommended to use a controlled vocabulary - if there is a value, it SHOULD be a value from the controlled vocabulary, which is not what this test is about.

ymgan commented 3 months ago

Darwin Core class should be MaterialEntity, without a space to be consistent with #279, #280

ArthurChapman commented 3 months ago

Thanks @ymgan - fixed to dwc:MaterialEntity