tdwg / bdq

Biodiversity Data Quality (BDQ) Interest Group
https://github.com/tdwg/bdq
42 stars 7 forks source link

TG2-VALIDATION_EVENTTEMPORAL_NOTEMPTY #88

Open iDigBioBot opened 6 years ago

iDigBioBot commented 6 years ago
TestField Value
GUID 41267642-60ff-4116-90eb-499fee2cd83f
Label VALIDATION_EVENTTEMPORAL_NOTEMPTY
Description Is there a value in any of the terms dwc:eventDate, dwc:year, dwc:month, dwc:day, dwc:startDayOfYear, dwc:endDayOfYear, dwc:verbatimEventDate?
TestType Validation
Darwin Core Class dwc:Event
Information Elements ActedUpon dwc:eventDate
dwc:year
dwc:month
dwc:day
dwc:startDayOfYear
dwc:endDayOfYear
dwc:verbatimEventDate
Information Elements Consulted
Expected Response COMPLIANT if any of dwc:eventDate, dwc:year, dwc:month, dwc:day, dwc:startDayOfYear, dwc:endDayOfYear, dwc:verbatimEventDate are bdq:NotEmpty; otherwise NOT_COMPLIANT.
Data Quality Dimension Completeness
Term-Actions EVENTTEMPORAL_NOTEMPTY
Parameter(s)
Source Authority
Specification Last Updated 2023-09-30
Examples [dwc:day="", dwc:month="", dwc:year="", dwc:eventDate="1962-11-01T10:00-0600", dwc:verbatimEventDate="", dwc:startDayOfYear="", dwc:endDayOfYear="": Response.status=RUN_HAS_RESULT, Response.result=COMPLIANT, Response.comment="dwc:eventDate is bdq:NotEmpty"]
[dwc:dateIdentified="", dwc:day="", dwc:month="", dwc:year="", dwc:eventDate="", dwc:verbatimEventDate="", dwc:startDayOfYear="", dwc:endDayOfYear="": Response.status=RUN_HAS_RESULT, Response.result=NOT_COMPLIANT, Response.comment="All input fields bdq:Empty"]
Source @Tasilee
References
Example Implementations (Mechanisms) Kurator:event_date_qc
Link to Specification Source Code https://github.com/FilteredPush/event_date_qc/blob/8740a00b52ef41cdda5fc7fa1689e5d95a23a94b/src/main/java/org/filteredpush/qc/date/DwCEventDQ.java#L1207 Unit test at https://github.com/FilteredPush/event_date_qc/blob/8740a00b52ef41cdda5fc7fa1689e5d95a23a94b/src/test/java/org/filteredpush/qc/date/DwcEventDQTest.java#L881
Notes Only fails if all of the relevant fields of the Darwin Core Event class are bdq:Empty or do not exist. Relevant Darwin Core fields include dwc:eventDate, dwc:verbatimEventDate, dwc:year, dwc:month, dwc:day, dwc:startDayOfYear, dwc:endDayOfYear. The terms dwc:eventID (if populated may or may not point to temporal information accessible to user of the data) and dwc:eventTime (which is rare) are not included.
iDigBioBot commented 6 years ago

Comment by John Wieczorek (@tucotuco) migrated from spreadsheet: Would this test return true if there were data in the relevant Event-related fields was meaningless (effectively null)?

iDigBioBot commented 6 years ago

Comment by Paul Morris (@chicoreus) migrated from spreadsheet: We need a standard definition of isEmpty(). All darwin core terms included in the core need a measure to go with them MEASURE_{term}_HASVALUE returning the standard measure results, COMPLETE or NOT_COMPLETE, this is one case of this set of measures.

iDigBioBot commented 6 years ago

Comment by Paul Morris (@chicoreus) migrated from spreadsheet: Rename MEASURE_DATE_VALUE_SINGLE

pzermoglio commented 6 years ago

I'd argue that the name of this test should not refer to the DwC Event class, as Event contains terms that are not related with the event date data (e.g., sampling terms). While we have used the class for location and taxon (#40, #105), those classes do not include terms that are not related to location and taxon respectively.

chicoreus commented 6 years ago

Prefrequisite: "None. It is not necessary for the record to have any fields in the Event class to run this test." But an Event must be logically present for this to be a meaningful measure of quality. Thus there is an assumption in the prerequisites that this test is run on flat darwin core or occurrence data.

chicoreus commented 6 years ago

Need to add eventTime to the list of terms in the information element.

ArthurChapman commented 6 years ago

Disagree @chicoreus - if you only have eventTime and nothing else you hardly have a useable EVENT element that you can do anything with. There probably is one or two uses where having something that says 14:26 -0600 and has no other Event information (e.g. it may tell you that that bee is active at that time of the day) - but the uses are extremely limited and thus I would suggest not CORE.

chicoreus commented 6 years ago

@ArthurChapman That's a good rationale for excluding eventTime from a test of emptyness of the Event. Almost none of the use cases we examined included time of day as a valuable information element. As the core tests are about information elements found to be valuable across a large set of use cases, then eventTime doesn't fit here.

chicoreus commented 5 years ago

The distinction between EVENT empty and EVENTDATE empty and some unamed comcept, let's call it EVENTTEMPORAL empty is important. The original intent of this test was to assess if an Event instance contained information about the date of the Event. As @pzermoglio notes, there are additional terms in the Event class that are not temporal, thus the intent of this test isn't about EVENT_EMPTY. Likewise, the scope of the test is broader than dwc:eventDate, so EVENTDATE_EMPTY isn't a good fit (though an argument could be made that after amendment, eventDate should contain a value, but this isn't quite true if the only popluated term were say dwc:day). I would recommend calling this something on the order of VALIDATION_EVENT_TEMPORAL_EMPTY. I think this captures the scope of what I understand the intent of this test to be.

However, there is still an issue (found this when implementing, comparing with the existing event_date_qc implementation) - dwc:eventID does not contain temporal data. It might be a pointer to somewhere where temporal information may be found, but it could just be an identifier for an Event that contains no temporal information. The presence of an eventID without accompaning values in the other temporal terms is unlikely to help consumers of DarwinCore data identify the key piece of information that we are doing the quality control for: when in time a particular occurrence was.

I would very strongly advocate that we remove dwc:eventID from the list of information elements for this test.

tucotuco commented 5 years ago

I concur.

On Wed, Aug 7, 2019 at 10:54 AM Paul J. Morris notifications@github.com wrote:

The distinction between EVENT empty and EVENTDATE empty and some unamed comcept, let's call it EVENTTEMPORAL empty is important. The original intent of this test was to assess if an Event instance contained information about the date of the Event. As @pzermoglio https://github.com/pzermoglio notes, there are additional terms in the Event class that are not temporal, thus the intent of this test isn't about EVENT_EMPTY. Likewise, the scope of the test is broader than dwc:eventDate, so EVENTDATE_EMPTY isn't a good fit (though an argument could be made that after amendment, eventDate should contain a value, but this isn't quite true if the only popluated term were say dwc:day). I would recommend calling this something on the order of VALIDATION_EVENT_TEMPORAL_EMPTY. I think this captures the scope of what I understand the intent of this test to be.

However, there is still an issue (found this when implementing, comparing with the existing event_date_qc implementation) - dwc:eventID does not contain temporal data. It might be a pointer to somewhere where temporal information may be found, but it could just be an identifier for an Event that contains no temporal information. The presence of an eventID without accompaning values in the other temporal terms is unlikely to help consumers of DarwinCore data identify the key piece of information that we are doing the quality control for: when in time a particular occurrence was.

I would very strongly advocate that we remove dwc:eventID from the list of information elements for this test.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tdwg/bdq/issues/88?email_source=notifications&email_token=AADQ725XJY6EXIVS7YHA3UDQDLHYVA5CNFSM4EKSOXDKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3YPLVI#issuecomment-519108053, or mute the thread https://github.com/notifications/unsubscribe-auth/AADQ724YJWXMHIR3VCO7RMDQDLHYVANCNFSM4EKSOXDA .

ArthurChapman commented 5 years ago

You make a good argument @chicoreus and I cannot fault your reasoning - so, like @tucotuco - I concur.

Tasilee commented 5 years ago

I also agree @chicoreus. I will edit now and would value a check when done.

chicoreus commented 5 years ago

@Tasilee Looking good.

I've also updated the Notes from:

Only fails if all of the relevant fields of the Darwin Core Event class are EMPTY or do not exist. Relevant Darwin Core fields include eventID, eventDate, verbatimEventDate, year, month, day, startDayOfYear, endDayOfYear.

to:

Only fails if all of the relevant fields of the Darwin Core Event class are EMPTY or do not exist. Relevant Darwin Core fields include eventDate, verbatimEventDate, year, month, day, startDayOfYear, endDayOfYear. The terms eventID (if populated may or may not point to temporal information accessible to user of the data) and eventTime (uses of eventTime are rare and put it out of scope of the CORE tests) are not included.

ArthurChapman commented 5 years ago

@chicoreus - suggestion OK by me

tucotuco commented 5 years ago

OK by me

On Fri, Aug 9, 2019 at 7:17 PM Arthur Chapman notifications@github.com wrote:

@chicoreus https://github.com/chicoreus - suggestion OK by me

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tdwg/bdq/issues/88?email_source=notifications&email_token=AADQ722X6DBBBZE3VZ2IYK3QDXUG7A5CNFSM4EKSOXDKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD375C3A#issuecomment-520081772, or mute the thread https://github.com/notifications/unsubscribe-auth/AADQ724P2JMGGHFLBIE6BXLQDXUG7ANCNFSM4EKSOXDA .

Tasilee commented 5 years ago

Wonderful :)

ArthurChapman commented 2 years ago

Looking at the test data - I am not sure that this is the best wording for the test. If there is just a "0" in dwc:eventData, or "XXXX" in dwc:year we are saying it is COMPLIANT but one couldn't determine the event date from these. We say just "dwc:day=15" is NOT_COMPLIANT. Looking at the test data and the way we have interpreted them so far are we really saying that dwc:year, dwc:eventDate, dwc:verbatimEventDate, dwc:startDayOfYear, dwc:endDayOfYear are all not EMPTY etc. or are we trying to say something else here?

ArthurChapman commented 2 years ago

Looking further, I think we can leave dwc:startDayOfYear and dwc:endDayOfYear off that list

chicoreus commented 2 years ago

@ArthurChapman the tests for Empty are by design just asking if there is any value, not if it has validity. Thus this test is intended to ask if there is any temporal information (valid or not) at all, other validations can test if the values are meaningful. This is a general pattern in the tests, there is a validation asking if there is any information at all in a part of the domain, other tests to see if particular key fields contain information, and yet other tests to see if the data is meaningful.

This allows for easy assembly of records that need data augmentation (have no values for temporal terms), from those that contain errors which need examination, from those that have potentially useful data.

ArthurChapman commented 2 years ago

So @chicoreus is "year=XXXX" COMPLIANT or NOT_COMPLIANT in that case?

chicoreus commented 2 years ago

The string XXXX is NOT EMPTY, by our definitions of EMPTY and NOT EMPTY, so dwc:year="XXXX" and all other event terms being empty would be expected to cause this test to return COMPLIANT.

VALIDATION_YEAR_EMPTY would also be expected to return COMPLIANT but VALIDATION_YEAR_OUTOFRANGE would return INTERNAL_PREREQISITES_NOT_MET. We didn't include a VALIDATION_YEAR_NOTSTANDARD in core, but if we did, it would return NOT_COMPLIANT. If the XXXX were in dwc:eventDate, we have a VALIDATION_EVENTDATE_NOTSTANDARD that in combination with other tests in core lets us clearly separate event data that is entirely empty from problematic data with various sorts of specific problems from data that complies with data format and range expectations.

Tasilee commented 2 years ago

I agree with @chicoreus. These EMPTY and NOT_EMPTY tests are doing exactly as they are named. They are not NOT/EMPTY/VALID tests. This test, along with #40 and #105 are detecting an absence of data for a data Dimension. In this test, even a NOT_EMPTY dwc:startDayOfYear is something that could be used in some scenarios (e.g., phenology).

ArthurChapman commented 2 years ago

The wording is different to all the other EMPTY tests - why don't we change wording to

COMPLIANT if at least one term needed to determine the event date is not EMPTY; otherwise NOT_COMPLIANT

Tasilee commented 2 years ago

Thanks @ArthurChapman. That sounds good to me.

Changing

COMPLIANT if at least one term needed to determine the event date exists and is not EMPTY; otherwise NOT_COMPLIANT

to

COMPLIANT if at least one term needed to determine a dwc:eventDate is not EMPTY; otherwise NOT_COMPLIANT

chicoreus commented 2 years ago

To be explicit about eventDate being a term to examine, we should probably include that in the definition

COMPLIANT if dwc:eventDate or at least one term needed to determine a dwc:eventDate is not EMPTY; otherwise NOT_COMPLIANT

However, by that definition, dwc:day, dwc:month, and dwc:startDayOfYear and dwc:endDayOfYear might be considered to not be included, as they in isoloation don't specify a dwc:eventDate.

How about:

NOT_COMPLIANT if all of dwc:eventDate, dwc:year, dwc:month, dwc:day, dwc:startDayOfYear, dwc:endDayOfYear, dwc:verbatimEventDate are EMPTY; otherwise COMPLIANT.

or

COMPLIANT if any of dwc:eventDate, dwc:year, dwc:month, dwc:day, dwc:startDayOfYear, dwc:endDayOfYear, dwc:verbatimEventDate are NOT EMPTY; otherwise NOT_COMPLIANT.

ArthurChapman commented 2 years ago

I like that @chicoreus - prefer the second.

Tasilee commented 2 years ago

Yes, being specific seems expedient, and I also opt for the second version.

tucotuco commented 2 years ago

I concur. Second option.

Tasilee commented 2 years ago

Changed.

Tasilee commented 2 years ago

In looking at the test data I wonder if we should have Information Element dwc:day? If we have only a value for dwc:day, is it of any temporal value when we don't have dwc:month or dwc:year?

tucotuco commented 2 years ago

I don't think so. It is not a test for VALIDATION_EVENT_TEMPORAL_USELESS, it's about empty. I think if we get into use-based subjective test results we are in for a world of trouble, and delay.

Tasilee commented 2 years ago

Noted.

ArthurChapman commented 1 year ago

In Note changed "... eventDate, verbatimEventDate, year, month, day, startDayOfYear, endDayOfYear... to dwc:eventDate, dwc:verbatimEventDate, dwc:year, dwc:month, dwc:day, dwc:startDayOfYear, dwc:endDayOfYear.

Tasilee commented 1 year ago

Splitting bdqffdq:Information Elements into "Information Elements ActedUpon" and "Information Elements Consulted".

Also changed "Field" to "TestField", "Output Type" to "TestType" and updated "Specification Last Updated"

chicoreus commented 3 months ago

Removing underscore to make TERM_ACTIONS consistent.