Closed iDigBioBot closed 4 years ago
TestField | Value |
---|---|
GUID | 28a42e6e-1a79-4728-ab91-aa7f191818de |
Label | MEASURE_DWC_COMPLETENESS |
Description | How many Darwin Core terms have a value in them. |
TestType | Measure |
Darwin Core Class | All |
Information Elements ActedUpon | AllDarwinCoreTerms |
Information Elements Consulted | |
Expected Response | The number of Darwin Core terms that are bdq:NotEmpty in the record |
Data Quality Dimension | Completeness |
Term-Actions | DWC_COMPLETENESS |
Specification Last Updated | 2024-02-20 |
Examples | [dwc:eventDate="1881-12-15", dwc:scientificNameID="urn:lsid:marinespecies.org:taxname:208134", dwc:decimalLatitude="21.45",dwc:decimalLongitude="": Response.status=RUN_HAS_RESULT, Response.result=3, Response.comment="Three bdq:NotEmpty Darwin Core terms"] |
Source | @Tasilee |
References | |
Example Implementations (Mechanisms) | |
Link to Specification Source Code | |
Notes | The maximum value this test may return can vary based on a number of factors including the structure of the data set, flat Darwin Core, star schema, and RDF representations are likely to contain different numbers of Darwin Core terms. MultiRecord measures of the minimum, mean, mode, and maximum values of this SingleRecord measure across a data set may be informative for some uses. |
Comment by Lee Belbin (@Tasilee) migrated from spreadsheet: I would prefer the complement (number of fields in record) as it would usually be less than the number absent
I'm writing a paper on the ALA and this aspect of occurrence records has popped up. I was the original proposer of this MEASURE as I thought it would help contribute to any estimate of the overall 'utility' of the record.
The issue was closed but there was no explanation as to why, so I've re-opened it for comment.
This looks like one we discussed at Gainesville and decided to make not CORE at that stage. I don't have notes on the discussion - so it must have been close to unanimous. I wonder, in the tests, what the value is to someone running their tests as it is only a measure. What are they going to do with the result. Early on, we had lots of measures and cut them down. From memory - I think we wondered the worth if you didn't restrict the terms you checked as in many databases, some terms may not be relevant. As I see it - the only value would be for Aggregators - and there is nothing stopping them from running a separate exercise that may be of more value to them.
Thanks @ArthurChapman . I agree that there is a before/after aspect to this MEASURE/S. I just want to make sure we capture some idea why this would not be a useful measure. It would be useful when comparing with other records. For example, when records are listed, it could be a parameter that may help identify some aspect of 'quality'. The negatives are that there is no indication of WHICH Darwin Core terms are filled in. As we all agree, it is the triplicate of NAME-SPACE-TIME terms that are fundamental. We have this covered with the three separate tests.
I've sent the worksheet where this test was raised, but no indication why dropped. The comments were
There is also a lot of tests whereby the field is empty (including absent) - either as INTERNAL_PREREQUESITE_NOT_MET of as a requirement before an amendment can be made.
Thanks @ArthurChapman. I agree and have raised this with the ALA.
No other comments?
On the basis of discussions today, this MEASURE is not sufficiently discriminatory. While it could be used as a basis of record comparison or multi-record summaries, there is no discrimination between the significance of the Darwin Core terms. Two records could produce a similar value with very different information quality. This MEASURE does however look into the future where some estimates of a record score could be assessed for particular applications. For example, for an SDM study, records with accepted and correct names to species level, coordinates with a spatial uncertainty of less than 100m and a date to day level (among other Darwin Core terms) would have high value .
Brought markdown table closer to current expectations. Added cautionary note and pointer to potential MultiRecord tests to accompany this one.
Added "Description"
Tweaked ER, removed dependency
Changed Test to TestField