Open iDigBioBot opened 6 years ago
TestField | Value |
---|---|
GUID | 41267642-60ff-4116-90eb-499fee2cd83f |
Label | VALIDATION_EVENTTEMPORAL_NOTEMPTY |
Description | Is there a value in any of the terms dwc:eventDate, dwc:year, dwc:month, dwc:day, dwc:startDayOfYear, dwc:endDayOfYear, dwc:verbatimEventDate? |
TestType | Validation |
Darwin Core Class | dwc:Event |
Information Elements ActedUpon | dwc:eventDate |
dwc:year | |
dwc:month | |
dwc:day | |
dwc:startDayOfYear | |
dwc:endDayOfYear | |
dwc:verbatimEventDate | |
Information Elements Consulted | |
Expected Response | COMPLIANT if any of dwc:eventDate, dwc:year, dwc:month, dwc:day, dwc:startDayOfYear, dwc:endDayOfYear, dwc:verbatimEventDate are bdq:NotEmpty; otherwise NOT_COMPLIANT. |
Data Quality Dimension | Completeness |
Term-Actions | EVENTTEMPORAL_NOTEMPTY |
Parameter(s) | |
Source Authority | |
Specification Last Updated | 2023-09-30 |
Examples | [dwc:day="", dwc:month="", dwc:year="", dwc:eventDate="1962-11-01T10:00-0600", dwc:verbatimEventDate="", dwc:startDayOfYear="", dwc:endDayOfYear="": Response.status=RUN_HAS_RESULT, Response.result=COMPLIANT, Response.comment="dwc:eventDate is bdq:NotEmpty"] |
[dwc:dateIdentified="", dwc:day="", dwc:month="", dwc:year="", dwc:eventDate="", dwc:verbatimEventDate="", dwc:startDayOfYear="", dwc:endDayOfYear="": Response.status=RUN_HAS_RESULT, Response.result=NOT_COMPLIANT, Response.comment="All input fields bdq:Empty"] | |
Source | @Tasilee |
References | |
Example Implementations (Mechanisms) | Kurator:event_date_qc |
Link to Specification Source Code | https://github.com/FilteredPush/event_date_qc/blob/8740a00b52ef41cdda5fc7fa1689e5d95a23a94b/src/main/java/org/filteredpush/qc/date/DwCEventDQ.java#L1207 Unit test at https://github.com/FilteredPush/event_date_qc/blob/8740a00b52ef41cdda5fc7fa1689e5d95a23a94b/src/test/java/org/filteredpush/qc/date/DwcEventDQTest.java#L881 |
Notes | Only fails if all of the relevant fields of the Darwin Core Event class are bdq:Empty or do not exist. Relevant Darwin Core fields include dwc:eventDate, dwc:verbatimEventDate, dwc:year, dwc:month, dwc:day, dwc:startDayOfYear, dwc:endDayOfYear. The terms dwc:eventID (if populated may or may not point to temporal information accessible to user of the data) and dwc:eventTime (which is rare) are not included. |
Comment by John Wieczorek (@tucotuco) migrated from spreadsheet: Would this test return true if there were data in the relevant Event-related fields was meaningless (effectively null)?
Comment by Paul Morris (@chicoreus) migrated from spreadsheet: We need a standard definition of isEmpty(). All darwin core terms included in the core need a measure to go with them MEASURE_{term}_HASVALUE returning the standard measure results, COMPLETE or NOT_COMPLETE, this is one case of this set of measures.
Comment by Paul Morris (@chicoreus) migrated from spreadsheet: Rename MEASURE_DATE_VALUE_SINGLE
I'd argue that the name of this test should not refer to the DwC Event class, as Event contains terms that are not related with the event date data (e.g., sampling terms). While we have used the class for location and taxon (#40, #105), those classes do not include terms that are not related to location and taxon respectively.
Prefrequisite: "None. It is not necessary for the record to have any fields in the Event class to run this test." But an Event must be logically present for this to be a meaningful measure of quality. Thus there is an assumption in the prerequisites that this test is run on flat darwin core or occurrence data.
Need to add eventTime to the list of terms in the information element.
Disagree @chicoreus - if you only have eventTime and nothing else you hardly have a useable EVENT element that you can do anything with. There probably is one or two uses where having something that says 14:26 -0600 and has no other Event information (e.g. it may tell you that that bee is active at that time of the day) - but the uses are extremely limited and thus I would suggest not CORE.
@ArthurChapman That's a good rationale for excluding eventTime from a test of emptyness of the Event. Almost none of the use cases we examined included time of day as a valuable information element. As the core tests are about information elements found to be valuable across a large set of use cases, then eventTime doesn't fit here.
The distinction between EVENT empty and EVENTDATE empty and some unamed comcept, let's call it EVENTTEMPORAL empty is important. The original intent of this test was to assess if an Event instance contained information about the date of the Event. As @pzermoglio notes, there are additional terms in the Event class that are not temporal, thus the intent of this test isn't about EVENT_EMPTY. Likewise, the scope of the test is broader than dwc:eventDate, so EVENTDATE_EMPTY isn't a good fit (though an argument could be made that after amendment, eventDate should contain a value, but this isn't quite true if the only popluated term were say dwc:day). I would recommend calling this something on the order of VALIDATION_EVENT_TEMPORAL_EMPTY. I think this captures the scope of what I understand the intent of this test to be.
However, there is still an issue (found this when implementing, comparing with the existing event_date_qc implementation) - dwc:eventID does not contain temporal data. It might be a pointer to somewhere where temporal information may be found, but it could just be an identifier for an Event that contains no temporal information. The presence of an eventID without accompaning values in the other temporal terms is unlikely to help consumers of DarwinCore data identify the key piece of information that we are doing the quality control for: when in time a particular occurrence was.
I would very strongly advocate that we remove dwc:eventID from the list of information elements for this test.
I concur.
On Wed, Aug 7, 2019 at 10:54 AM Paul J. Morris notifications@github.com wrote:
The distinction between EVENT empty and EVENTDATE empty and some unamed comcept, let's call it EVENTTEMPORAL empty is important. The original intent of this test was to assess if an Event instance contained information about the date of the Event. As @pzermoglio https://github.com/pzermoglio notes, there are additional terms in the Event class that are not temporal, thus the intent of this test isn't about EVENT_EMPTY. Likewise, the scope of the test is broader than dwc:eventDate, so EVENTDATE_EMPTY isn't a good fit (though an argument could be made that after amendment, eventDate should contain a value, but this isn't quite true if the only popluated term were say dwc:day). I would recommend calling this something on the order of VALIDATION_EVENT_TEMPORAL_EMPTY. I think this captures the scope of what I understand the intent of this test to be.
However, there is still an issue (found this when implementing, comparing with the existing event_date_qc implementation) - dwc:eventID does not contain temporal data. It might be a pointer to somewhere where temporal information may be found, but it could just be an identifier for an Event that contains no temporal information. The presence of an eventID without accompaning values in the other temporal terms is unlikely to help consumers of DarwinCore data identify the key piece of information that we are doing the quality control for: when in time a particular occurrence was.
I would very strongly advocate that we remove dwc:eventID from the list of information elements for this test.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tdwg/bdq/issues/88?email_source=notifications&email_token=AADQ725XJY6EXIVS7YHA3UDQDLHYVA5CNFSM4EKSOXDKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3YPLVI#issuecomment-519108053, or mute the thread https://github.com/notifications/unsubscribe-auth/AADQ724YJWXMHIR3VCO7RMDQDLHYVANCNFSM4EKSOXDA .
You make a good argument @chicoreus and I cannot fault your reasoning - so, like @tucotuco - I concur.
I also agree @chicoreus. I will edit now and would value a check when done.
@Tasilee Looking good.
I've also updated the Notes from:
Only fails if all of the relevant fields of the Darwin Core Event class are EMPTY or do not exist. Relevant Darwin Core fields include eventID, eventDate, verbatimEventDate, year, month, day, startDayOfYear, endDayOfYear.
to:
Only fails if all of the relevant fields of the Darwin Core Event class are EMPTY or do not exist. Relevant Darwin Core fields include eventDate, verbatimEventDate, year, month, day, startDayOfYear, endDayOfYear. The terms eventID (if populated may or may not point to temporal information accessible to user of the data) and eventTime (uses of eventTime are rare and put it out of scope of the CORE tests) are not included.
@chicoreus - suggestion OK by me
OK by me
On Fri, Aug 9, 2019 at 7:17 PM Arthur Chapman notifications@github.com wrote:
@chicoreus https://github.com/chicoreus - suggestion OK by me
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tdwg/bdq/issues/88?email_source=notifications&email_token=AADQ722X6DBBBZE3VZ2IYK3QDXUG7A5CNFSM4EKSOXDKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD375C3A#issuecomment-520081772, or mute the thread https://github.com/notifications/unsubscribe-auth/AADQ724P2JMGGHFLBIE6BXLQDXUG7ANCNFSM4EKSOXDA .
Wonderful :)
Looking at the test data - I am not sure that this is the best wording for the test. If there is just a "0" in dwc:eventData, or "XXXX" in dwc:year we are saying it is COMPLIANT but one couldn't determine the event date from these. We say just "dwc:day=15" is NOT_COMPLIANT. Looking at the test data and the way we have interpreted them so far are we really saying that dwc:year, dwc:eventDate, dwc:verbatimEventDate, dwc:startDayOfYear, dwc:endDayOfYear are all not EMPTY etc. or are we trying to say something else here?
Looking further, I think we can leave dwc:startDayOfYear and dwc:endDayOfYear off that list
@ArthurChapman the tests for Empty are by design just asking if there is any value, not if it has validity. Thus this test is intended to ask if there is any temporal information (valid or not) at all, other validations can test if the values are meaningful. This is a general pattern in the tests, there is a validation asking if there is any information at all in a part of the domain, other tests to see if particular key fields contain information, and yet other tests to see if the data is meaningful.
This allows for easy assembly of records that need data augmentation (have no values for temporal terms), from those that contain errors which need examination, from those that have potentially useful data.
So @chicoreus is "year=XXXX" COMPLIANT or NOT_COMPLIANT in that case?
The string XXXX is NOT EMPTY, by our definitions of EMPTY and NOT EMPTY, so dwc:year="XXXX" and all other event terms being empty would be expected to cause this test to return COMPLIANT.
VALIDATION_YEAR_EMPTY would also be expected to return COMPLIANT but VALIDATION_YEAR_OUTOFRANGE would return INTERNAL_PREREQISITES_NOT_MET. We didn't include a VALIDATION_YEAR_NOTSTANDARD in core, but if we did, it would return NOT_COMPLIANT. If the XXXX were in dwc:eventDate, we have a VALIDATION_EVENTDATE_NOTSTANDARD that in combination with other tests in core lets us clearly separate event data that is entirely empty from problematic data with various sorts of specific problems from data that complies with data format and range expectations.
I agree with @chicoreus. These EMPTY and NOT_EMPTY tests are doing exactly as they are named. They are not NOT/EMPTY/VALID tests. This test, along with #40 and #105 are detecting an absence of data for a data Dimension. In this test, even a NOT_EMPTY dwc:startDayOfYear is something that could be used in some scenarios (e.g., phenology).
The wording is different to all the other EMPTY tests - why don't we change wording to
COMPLIANT if at least one term needed to determine the event date is not EMPTY; otherwise NOT_COMPLIANT
Thanks @ArthurChapman. That sounds good to me.
Changing
COMPLIANT if at least one term needed to determine the event date exists and is not EMPTY; otherwise NOT_COMPLIANT
to
COMPLIANT if at least one term needed to determine a dwc:eventDate is not EMPTY; otherwise NOT_COMPLIANT
To be explicit about eventDate being a term to examine, we should probably include that in the definition
COMPLIANT if dwc:eventDate or at least one term needed to determine a dwc:eventDate is not EMPTY; otherwise NOT_COMPLIANT
However, by that definition, dwc:day, dwc:month, and dwc:startDayOfYear and dwc:endDayOfYear might be considered to not be included, as they in isoloation don't specify a dwc:eventDate.
How about:
NOT_COMPLIANT if all of dwc:eventDate, dwc:year, dwc:month, dwc:day, dwc:startDayOfYear, dwc:endDayOfYear, dwc:verbatimEventDate are EMPTY; otherwise COMPLIANT.
or
COMPLIANT if any of dwc:eventDate, dwc:year, dwc:month, dwc:day, dwc:startDayOfYear, dwc:endDayOfYear, dwc:verbatimEventDate are NOT EMPTY; otherwise NOT_COMPLIANT.
I like that @chicoreus - prefer the second.
Yes, being specific seems expedient, and I also opt for the second version.
I concur. Second option.
Changed.
In looking at the test data I wonder if we should have Information Element dwc:day? If we have only a value for dwc:day, is it of any temporal value when we don't have dwc:month or dwc:year?
I don't think so. It is not a test for VALIDATION_EVENT_TEMPORAL_USELESS, it's about empty. I think if we get into use-based subjective test results we are in for a world of trouble, and delay.
Noted.
In Note changed "... eventDate, verbatimEventDate, year, month, day, startDayOfYear, endDayOfYear... to dwc:eventDate, dwc:verbatimEventDate, dwc:year, dwc:month, dwc:day, dwc:startDayOfYear, dwc:endDayOfYear.
Splitting bdqffdq:Information Elements into "Information Elements ActedUpon" and "Information Elements Consulted".
Also changed "Field" to "TestField", "Output Type" to "TestType" and updated "Specification Last Updated"
Removing underscore to make TERM_ACTIONS consistent.