tdwg / bdq

Biodiversity Data Quality (BDQ) Interest Group
https://github.com/tdwg/bdq
43 stars 7 forks source link

TG2-VALIDATION_MODIFIED_NOTEMPTY #224

Closed Tasilee closed 9 months ago

Tasilee commented 10 months ago
TestField Value
GUID e17918fc-25ca-4a3a-828b-4502432b98c4
Label VALIDATION_MODIFIED_NOTEMPTY
Description Is there a value in dcterms:modified?
TestType Validation
Darwin Core Class dcterms
Information Elements ActedUpon dcterms:modified
Information Elements Consulted
Expected Response COMPLIANT if dcterms:modified is bdq:NotEmpty; otherwise NOT_COMPLIANT
Data Quality Dimension Completeness
Term-Actions DCTERMSMODIFIED_NOTEMPTY
Parameter(s)
Source Authority
Specification Last Updated 2024-01-29
Examples [dcterms:modified="2022-01-02": Response.status=RUN_HAS_RESULT, Response.result=COMPLIANT, Response.comment="dcterms:modified is bdq:NotEmpty"]
[dcterms:modified="[null]": Response.status=RUN_HAS_RESULT, Response.result=NOT_COMPLIANT, Response.comment="dcterms:modified is bdq:Empty"]
Source TG2
References
Example Implementations (Mechanisms)
Link to Specification Source Code
Notes This bdq:Supplementary test is not regarded as CORE (cf. bdq:CORE) because of one or more of the reasons: not being widely applicable; not informative; not straightforward to implement or likely to return a high percentage of either bdq:COMPLIANT or bdq:NOT_COMPLIANT results (cf bdq:Response.result). A Supplementary test may be implemented as CORE when a suitable use case exists. See Issue comments below.
chicoreus commented 10 months ago

This feels like it should be core, and we should take a position that dcterms:modified should always contain a value.

ArthurChapman commented 10 months ago

Interesting - are people using it? It should be automatically generated. Is this one, that if CORE, would have a near 100% failure?

chicoreus commented 10 months ago

@ArthurChapman I would expect this is a value that aggregators really want to have populated so that they can tell that they need to update their aggregated records from changed data in the source without having to examine all the provided values against their stored values, if modified is newer, then update, if modified is not newer, then the aggregator may either trust that assertion or compare the records. For complex chains where data is being combined from more than one aggregation source, modified is an indication of which record to use when the same record is provided from more than one path...

Tasilee commented 10 months ago

I've asked @ArthurChapman to generate some definitions of "Supplementary" and "Do not implement" because we need to be clear on the differences. For example, on this test, @chicoreus states that it is 'aspirational' because the benefits of entries can be appreciated, even though it will be rarely populated. But, @chicoreus says that #233 should be 'do not implement' because the field will be largely unpopulated, and one presumes, 'not aspirational', or 'complex/impossible to implement'.

Hence, we need a clear statement (Vocabulary at least) on what "Supplementary" and "Do not implement" mean, and I agree with Arthur that reasoning should be added to the Notes to make it clear why we tag the test as such.

tucotuco commented 10 months ago

Interesting - are people using it? It should be automatically generated. Is this one, that if CORE, would have a near 100% failure?

545407660 out of 2232326955 Occurrence records (24.4%) from data publishers aggregated in GBIF (2023-8-01) have a populated dcterms:modified field. 10 data publishers account for 14.6% of that 24.4%.

I don't think it is realistic for most data providers to provide a useful dcterms:modified value and I think aggregators are fine with this. It is easier to run all data in a dataset through a pipeline if that dataset gets a new version. In fact, it is necessary, since all of the taxonomy rectification that happens is a source of modification for the aggregated record, and is uncoupled from the dataset. It pretty much has to be done.

So, I think it is a myth that the aggregators will benefit from this field. Whatever SHOULD be the case, it is not realistic.

ArthurChapman commented 9 months ago

In the light of @tucotuco comment., I suggest that we do not include this test. Either remove "Supplementary" tag and close issue, or tag as "DO NOT IMPLEMENT.

tucotuco commented 9 months ago

I would leave it as Supplementary. There is nothing particularly difficult or controversial about its implementation, which is what I think the DO NOT IMPLEMENT label is meant to signify. It's just that it isn't a particularly useful test on a global scale. This could be different for a specific use case.

chicoreus commented 9 months ago

Changed title/label to be consistent with other tests of dcterms:modified #272, #273, #274