tdwg / bdq

Biodiversity Data Quality (BDQ) Interest Group
https://github.com/tdwg/bdq
43 stars 7 forks source link

TG2-VALIDATION_EVENT_CONSISTENT #67

Open iDigBioBot opened 6 years ago

iDigBioBot commented 6 years ago
TestField Value
GUID 5618f083-d55a-4ac2-92b5-b9fb227b832f
Label VALIDATION_EVENT_CONSISTENT
Description Are the values in dwc:eventDate consistent with the values in dwc:year, dwc:month, dwc:day, dwc:startDayOfYear and dwc:endDayOfYear?
TestType Validation
Darwin Core Class dwc:Event
Information Elements ActedUpon dwc:eventDate
dwc:day
dwc:month
dwc:year
dwc:startDayOfYear
dwc:endDayOfYear
Information Elements Consulted
Expected Response INTERNAL_PREREQUISITES_NOT_MET if dwc:eventDate is bdq:Empty, or all of dwc:year, dwc:month, dwc:day, dwc:startDayOfYear and dwc:endDayOfYear are bdq:Empty; COMPLIANT if all of the following conditions are met (1) dwc:year is bdq:Empty or dwc:eventDate has a precision of one year or finer and and is within a single year and the provided value of dwc:year matches the year expressed in dwc:eventDate, and (2) dwc:month is bdq:Empty or dwc:eventDate has a precision of one month or finer and is within a single month and the provided value in dwc:month matches the month represented by dwc:eventDate, and (3) dwc:day is bdq:Empty or dwc:eventDate has a precision of a day or less and is within a single day and the provided value in dwc:day matches the day represented by dwc:eventDate, and (4) dwc:startDayOfYear is empty or dwc:eventDate has a precision of one day or finer and the provided value in dwc:startDayOfYear matches the start day of the year of the range represented by dwc:eventDate, and (5) dwc:endDayOfYear is empty or dwc:eventDate has a precision of one day or finer and the provided value in dwc:endDayOfYear matches the end day of the year of the range represented by dwc:eventDate; otherwise NOT_COMPLIANT.
Data Quality Dimension Consistency
Term-Actions EVENTDATE_CONSISTENT
Parameter(s)
Source Authority
Specification Last Updated 2023-09-18
Examples [dwc:day="15", dwc:month="9", dwc:year="1949", dwc:eventDate="1949-09-15T12:34", dwc:startDayOfYear="", dwc:endDayOfYear="": Response.status=RUN_HAS_RESULT, Response.result=COMPLIANT, Response.comment="dwc:day, dwc:month and dwc:year match dwc:eventDate"]
[dwc:day="15", dwc:month="9", dwc:year="1949", dwc:eventDate="1949-09-16T12:34", dwc:startDayOfYear="", dwc:endDayOfYear="": Response.status=RUN_HAS_RESULT, Response.result=NOT_COMPLIANT, Response.comment="dwc:day does not match dwc:eventDate"]
Source GBIF
References
Example Implementations (Mechanisms) Kurator:event_date_qc
Link to Specification Source Code https://github.com/FilteredPush/event_date_qc/blob/029466e0dc5ef649e7768ab19f75c86094023fce/src/main/java/org/filteredpush/qc/date/DwCEventDQ.java#L1179 minimal set of unit tests at https://github.com/FilteredPush/event_date_qc/blob/029466e0dc5ef649e7768ab19f75c86094023fce/src/test/java/org/filteredpush/qc/date/DwcEventDQTest.java#L1149
Notes This test does not take a position on whether the value in dwc:eventDate, or the values in the atomic terms are correct, it simply points out the presence of inconsistencies. For this test, dwc:eventTime is explicitly ignored. It may be useful to consider an additional test that does evaluate dwc:eventTime and dwc:eventDate. In that case, but not in this test, if the time is present in both dwc:eventDate and dwc:eventTime, and it is inconsistent, it may indicate an error in the dwc:eventDate, thus making it a problem that someone needs to evaluate. This test will only assert consistency if the data are both internally consistent and are compliant with the term definitions, for example dwc:day, by its definition, can only be the day of an dwc:eventDate that has a precision of a day or better and is not a range that spans more than a single day. A dwc:day that was internally consistent with the first day of the year (that is, 1) of an dwc:eventDate that only had precision to a year would be consistent internally, but not consistent with the Darwin Core term definitions, and would not return COMPLIANT from this test.
iDigBioBot commented 6 years ago

Comment by Paula Zermoglio (@pzermoglio) migrated from spreadsheet: This is assuming that eventDate was not interpreted from day, month, year to begin with, right? If not, a "passed test" would have no meaning here.

chicoreus commented 6 years ago

Should also include eventTime - time could be represented in both eventDate and eventTime, and thus could be inconsistent.

Could also include verbatimEventDate, a parse of verbatim event date could be inconsistent with the eventDate and the atomic fields. This could also be a separate validation.

Need specific guidance on consistency when there is a date range spanning a year boundary. For example: year, month, and day must represent the first day of a date range present in eventDate, and startDayOfYear must represent be the first day of the date range found in eventDate, and endDayOfYear must represent the last day of the date range found in eventDate.

Need specific guidance on consistency when there is a date range in event in the form of a year or year and month (e.g. "1834" or "1886-05"). For example: year, month, and day must represent the first day of the date range present in eventDate, and startDayOfYear must represent be the first day of the date range found in eventDate, and endDayOfYear must represent the last day of the date range found in eventDate.

Need specific guidance on how to assess consistency when eventDate is absent, particularly when date ranges are involved, including date ranges spanning a year boundary.

chicoreus commented 6 years ago

This one is complex to implement. As noted above, we should provide additional specific guidance on what constitutes inconsistencies (and which pairs are checked).

I've included a link in the issue to a minimal implementation (which does currently include eventTime (which I think should be there) and verbatimEventDate (which might be belong as a separate test, not sure on that one)) along with a minimal set of unit tests. Implementing this highlighted the need for better specification, including questions such as how to compare a year+ day with a startDayOfYear when no month is present (does that result in prerequisites not met, or does one check if startDayOfYear is the same as the day for some month in the year, etc. There are many possible edge cases and many possible combinations of tests, especially when date ranges are involved).

It probably doesn't matter so much what we say as much as it matters that we say something specific, otherwise this is a place where different implementations could easily produce very different results.

ArthurChapman commented 6 years ago

Indeed complex. One of the reasons we left out time (here and in nearly every other test) because time complicates it and we largely agreed that time was not CORE for our purposes and it failed the KISS Principle. Even without time, this is a complex test. As you say, we need to define this one carefully and more precisely.

chicoreus commented 6 years ago

Unlike #88, I think that this one does need to include eventTime. If a time is present in both eventDate and eventTime and is not consistent between the two, then this flags a problem in the record which does need closer examination (and it may arise from other data entry/transcription/interpretation problems with the date. It is likely a rare case, but seeing it suggests a problem may be present in the value of eventDate itself (not just in the not important for core time of day aspect of the Event).

Tasilee commented 6 years ago

We potentially have two classes of date to compare with dwc:eventDate-

dwc:year, dwc:month, dwc:day and startDayOfYear, endDayOfYear

Neither of these classes have 'time'. So we are comparing down to temporal resolution dwc:day with a dwc:eventDate that may or may not have a time element, but as we have stated, temporal resolution to day is more than sufficient. I cannot therefore see why dwc:eventTime is required.

chicoreus commented 6 years ago

We can think of eventDate as canonical, and the rest of the terms as atomic parses of those terms. The situation is similar to the case of dwc:scientificName and dwc:scientificNameAuthorship, except that the time part of the date can correctly (because of the invocation of the ISO specification) go in both dwc:eventDate and dwc:eventTime), or be present in just dwc:eventTime. If the time is present in just dwc:eventTime, then we can ignore it as not core. But, if the time is present in both dwc:eventDate and dwc:eventTime, and it is inconsistent, it may indicate an error in the dwc:eventDate, thus making it a problem that someone needs to evaluate.

tucotuco commented 6 years ago

Agreed at TDWG 2018 DQIG meeting that this test should explicitly ignore dwc:eventTime, though there may be room for a test that considers the dwc:eventDate dwc:eventTime consistency.

chicoreus commented 5 years ago

dwc:eventTime is listed as an information element but is not included in the specification.

The discussion and the notes indicate that we should remove dwc:eventTime as an information element, that the specification is correct, and that testing consistency time in dwc:eventDate against dwc:eventTime is something that should go into a separate test. Is there agreement on this?

ArthurChapman commented 5 years ago

@chicoreus - I have removed dwc:eventTime from Information Elements

tucotuco commented 5 years ago

Agree.

On Fri, Aug 9, 2019 at 7:19 PM Arthur Chapman notifications@github.com wrote:

@chicoreus https://github.com/chicoreus - I haqve removed dwc:eventTime from Information Elements

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/tdwg/bdq/issues/67?email_source=notifications&email_token=AADQ7252JATSPNT66FEECULQDXUOHA5CNFSM4EKSNABKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD375GDQ#issuecomment-520082190, or mute the thread https://github.com/notifications/unsubscribe-auth/AADQ722NHDFFTPXUVWBRCGDQDXUOHANCNFSM4EKSNABA .

Tasilee commented 5 years ago

Agree

chicoreus commented 5 years ago

We've sorted out Time, but the current specification is still problematic for implementors.

COMPLIANT if the provided values for dwc:year, dwc:month, dwc:day, dwc:startDayOfYear and dwc:endDayofYear are within the range of the supplied dwc:eventDate;

This specification asserts that eventDate=2019-05-01/2019-09-01, year=2019,month=5,day=12, startDayOfYear=180 endDayOfYear=181 is compliant, even though the day, startDayOfYear, and endDayOfYear are neither the same as the start/end of the eventDate nor consistent with each other.

The problematic bit is "within the range", as this causes multiple problems, both with ranges within but not matching start/end and with ranges extending beyond, the within the range specification makes year=1780, month="", day="", eventDate=1780-05-16 likely to be interpreted as inconsistent, as 1780 is a range which extends beyond the specific day.

We need a clearer and more specific definition.

I will suggest that the essence of the test is:

NOT_COMPLIANT if the non-empty value in year does not match the start year of the range represented by eventDate, or the non-empty value in month does not match the start month of the range represented by eventDate, or the non-empty value in day does not match the start day of the range represented by eventDate, or the non-empty value in startDayOfYear does not match the start day of the year of the range represented by eventDate or the non-empty value in endDayOfYear does not match the end day of the year the range represented by eventDate.

Since we require eventDate to have a value to run a test, this is a test of the consistency of the other fields with eventDate, and a comparison between year/month/day and startDayOfYear would not be included in this test.

Thus I propose the following specification:

INTERNAL_PREREQUISITES_NOT_MET if dwc:eventDate is EMPTY, or all of dwc:year, dwc:month, dwc:day, dwc:startDayOfYear and dwc:endDayOfYear are EMPTY; COMPLIANT if all of the following conditions are met (the provided value of year matches the start year of the range represented by eventDate or year is empty) and (the provided value in month matches the start month of the range represented by eventDate or month is empty), and (the provided value in day matches the start day of the range represented by eventDate or day is empty) and (the provided value in startDayOfYear matches the start day of the year of the range represented by eventDate or startDayOfYear is empty) and (the provided value in endDayOfYear matches the end day of the year the range represented by eventDate or endDayOfYear is empty); otherwise NOT_COMPLIANT.

This should cover all of the concerns in my Jan 31 comments. A date range spanning a year boundary is covered by specifying the start of the range of the eventDate matches year/month/day and startDayOfYear and the end of the range of the eventDate matches the endDayOfYear. Likewise date range in event in the form of a year or year and month is explicit in the references to the start/end of the range represented by eventDate. Calling eventDate a prerequisite for this test and excluding verbatimEventDate makes it much simpler, but, if eventDate is filled in from one of the three sources, and as is the case with most of the data we are seeing in the wild, startDayOfYear/endDayOfYear are not populated, an inconsistency between year/month/day and verbatimEventDate will not be caught.

ArthurChapman commented 5 years ago

That looks good @chicoreus - I like the solution

Tasilee commented 5 years ago

Wow. A mind-bender. The brackets seem to be used as delimiters rather than clarification, but we have used numbers somewhere else as in "COMPLIANT if 1) xxx, and 2) yyy...

chicoreus commented 5 years ago

@tasilee Good point, let me rephrase it in that form:

INTERNAL_PREREQUISITES_NOT_MET if dwc:eventDate is EMPTY, or all of dwc:year, dwc:month, dwc:day, dwc:startDayOfYear and dwc:endDayOfYear are EMPTY; COMPLIANT if all of the following conditions are met 1) the provided value of year matches the start year of the range represented by eventDate or year is empty, and 2) the provided value in month matches the start month of the range represented by eventDate or month is empty, and 3) the provided value in day matches the start day of the range represented by eventDate or day is empty, and 4) the provided value in startDayOfYear matches the start day of the year of the range represented by eventDate or startDayOfYear is empty, and 5) the provided value in endDayOfYear matches the end day of the year the range represented by eventDate or endDayOfYear is empty; otherwise NOT_COMPLIANT.

Tasilee commented 5 years ago

Thanks @chicoreus: I think that makes it easier to parse. What do others think? I will edit accordingly as I am getting worried about missing changes that need to be applied. We will have to systematically review at least the Expected Responses and Parameter(s) of each test, and check the latest comments as we go.

tucotuco commented 5 years ago

Looks good.

chicoreus commented 2 years ago

GUID 5618f083-d55a-4ac2-92b5-b9fb227b832f duplicates that of #125 VALIDATION_DAY_OUTOFRANGE

chicoreus commented 2 years ago

Deduplicated by assigning a new GUID to #125

ArthurChapman commented 2 years ago

Edited Expected Response to add "dwc:" to all the the Darwin Core terms

From INTERNAL_PREREQUISITES_NOT_MET if dwc:eventDate is EMPTY, or all of dwc:year, dwc:month, dwc:day, dwc:startDayOfYear and dwc:endDayOfYear are EMPTY; COMPLIANT if all of the following conditions are met 1) the provided value of year matches the start year of the range represented by eventDate or year is empty, and 2) the provided value in month matches the start month of the range represented by eventDate or month is empty, and 3) the provided value in day matches the start day of the range represented by eventDate or day is empty, and 4) the provided value in startDayOfYear matches the start day of the year of the range represented by eventDate or startDayOfYear is empty, and 5) the provided value in endDayOfYear matches the end day of the year the range represented by eventDate or endDayOfYear is empty; otherwise NOT_COMPLIANT.

To

INTERNAL_PREREQUISITES_NOT_MET if dwc:eventDate is EMPTY, or all of dwc:year, dwc:month, dwc:day, dwc:startDayOfYear and dwc:endDayOfYear are EMPTY; COMPLIANT if all of the following conditions are met 1) the provided value of year matches the start year of the range represented by dwc:eventDate or dwc:year is empty, and 2) the provided value in dwc:month matches the start month of the range represented by dwc:eventDate or dwc:month is empty, and 3) the provided value in dwc:day matches the start day of the range represented by dwc:eventDate or day is empty, and 4) the provided value in dwc:startDayOfYear matches the start day of the year of the range represented by dwc:eventDate or dwc:startDayOfYear is empty, and 5) the provided value in dwc:endDayOfYear matches the end day of the year the range represented by dwc:eventDate or dwc:endDayOfYear is empty; otherwise NOT_COMPLIANT.

Tasilee commented 1 year ago

Corrected typo in Expected Response "day" to "dwc:day"

chicoreus commented 1 year ago

Minor clarification to specification: "1) the provided value of year " changed to "1) the provided value of dwc:year ".

chicoreus commented 1 year ago

Missing word "of" in specification: changed "5) the provided value in dwc:endDayOfYear matches the end day of the year the range represented by dwc:eventDate" to "5) the provided value in dwc:endDayOfYear matches the end day of the year of the range represented by dwc:eventDate".

Probably needs clarification in the notes that "end day of the year"/"start day of the year" in 4 and 5 refer to the dayOfYear value for the last/first day in the range, not the last/first day of the year, or this may need the text to be corrected:

From:

"4) the provided value in dwc:startDayOfYear matches the start day of the year of the range represented by dwc:eventDate or dwc:startDayOfYear is empty, and 5) the provided value in dwc:endDayOfYear matches the end day of the year of the range represented by dwc:eventDate or dwc:endDayOfYear is empty;"

To:

"4) the provided value in dwc:startDayOfYear matches the dayOfYear of the start of the range represented by dwc:eventDate or dwc:startDayOfYear is empty, and 5) the provided value in dwc:endDayOfYear matches the dayOfYear of the end of the range represented by dwc:eventDate or dwc:endDayOfYear is empty;"

chicoreus commented 1 year ago

Another alternate phrasing would be to use ordinal day number:

"4) the provided value in dwc:startDayOfYear matches the ordinal day number of the start of the range represented by dwc:eventDate or dwc:startDayOfYear is empty, and 5) the provided value in dwc:endDayOfYear matches the ordinal day number of the end of the range represented by dwc:eventDate or dwc:endDayOfYear is empty;"

tucotuco commented 1 year ago

I would avoid "ordinal" as they are very different concepts in linguists vs set theory and I expect most users to be more familiar with the linguistics usage than the set theory one. In fact, in the long and sordid history of this term, "ordinal" was specifically removed.

chicoreus commented 1 year ago

@tucotuco concur, that raises confusion amongst implementors who could have differing backgrounds with the ISO usage of ordinal day risking further confusion. What we do need to be explicit about is that dwc:eventDate=1982-01-10/1983-01-15 is consistent with dwc:startDayOfYear=10, dwc:endDayOfYear=15. "end day of the year of the range" sounds like "the end day of the year of the range" dwc:endDayOfYear=365, rather than"the day of the year of the last day of the end part of the range", 15. Ordinal day avoids the confusion between day of the year and end, but as you note adds other (including off by one) confusion.

tucotuco commented 1 year ago

Looking at the definitions in Darwin Core doesn't help much, and there are no examples that cross a year boundary. We should propose changes there. Something like this?

startDayOfYear: "The integer day of the year for the start date of the Event." Comments: The startDayOfYear should be left empty if the beginning of the eventDate interval does not have at least day-level precision. Examples: 1 (January 1), 366 (31 December in a leap year), 365 (31 December in a non-leap year), 283 (for eventDate 1982-10-10/1983-01-15)

endDayOfYear: "The integer day of the year for the end date of the Event." Comments: For an end date that falls in a year different from the start date, the endDayOfYear refers to the integer day of the year on which the end date occurred, not the maximum integer day of the year in the date interval. The endDayOfYear should be left empty if the end of the eventDate interval does not have at least day-level precision. Examples: 1 (January 1), 366 (31 December in a leap year), 365 (31 December in a non-leap year), 15 (for eventDate 1982-10-10/1983-01-15)

chicoreus commented 1 year ago

@tucotuco The beginning of the eventDate interval and the end of the eventDate interval are both instants.

eventDate=1981, eventDate=1981-01/1981-12, and eventDate=1981-01-01/1981-12-31 are all asserting the same interval with the same start and the same end.

chicoreus commented 1 year ago

Let's decouple two things here: (1) The ambiguity of " the start day of the year of the range", where we mean the day of the year, from 1 to 366 of the day that is the start of the range, not January 1 of the year, likewise for the end of the range. And (2) what are the expectations when a dwc:eventDate contains a range that spans a year boundary.

(1) is simply a phrasing problem for this test to make sure that implementors understand the intent. (2), as you point out also has implications for guidance in Darwin Core.

chicoreus commented 1 year ago

@tucotuco here is one interpretation of expectations, which may not be yours:

eventDate startDayOfYear endDayOfYear
1981-01-03 3 3
1981-01 1 31
1981 1 365
1981-01/1981-12 1 365
1981-01-01/1981-12-31 1 365
1980/1981 1 365
1980-01-10/1981-01-15 10 15
1981-12-30/1982-01-03 364 3
chicoreus commented 1 year ago

Adding in dwc:year, dwc:month, dwc:day, all of the following would be RUN_HAS_RESULT, COMPLIANT for the current specification of this test.

eventDate startDayOfYear endDayOfYear year month day
1981-01-03 3 3 1981 1 3
1981-01 1 31 1981 1 1
1981-01 1 31 1981 1
1981 1 365 1981 1 1
1981 1 365 1981
1981-01/1981-12 1 365 1981 1 1
1981-01/1981-12 1 365 1981 1
1981-01-01/1981-12-31 1 365 1981 1 1
1980/1981 1 365 1980 1 1
1980/1981 1 365 1980
1980-01-10/1981-01-15 10 15 1980 1 10
1981-12-30/1982-01-03 364 3 1981 12 30
tucotuco commented 1 year ago

I don't agree with this interpretation of the date. To me, 1981 means some time in the year 1981. The start and end days are unknown and should be empty.

chicoreus commented 1 year ago

On Sat, 10 Jun 2023 10:37:04 -0700 John Wieczorek @.***> wrote:

I don't agree with this interpretation of the date. To me, 1981 means some time in the year 1981. The start and end days are unknown and should be empty.

As far as the Darwin Core definitions for eventDate go, 1981, 1981-01/1981-12, 1981-01-01/1981-12-31 are indistinguisable serializations of the same date/time interval, representing a period of time from the first instant of 1981 to the last instant of 1981. This is particularly true given the diverse data sources that may provide data serialized as Darwin Core and the range of libraries for working with dates that may be used to examine these values.

All date values in dwc:eventDate, unless a time range is specified, represent an interval of time where the event occured at some unspecified portion of that interval. In practice, at least with regards to data vouchered by specimens, where the event date represents a date collected, the date is never known other than to some time within the specfied interval.

The definition and examples in dwc:eventDate are clear:

"The date-time or interval during which an Event occurred." During which, not over which, or on which.

Example: 1809-02-12 (some time during 12 February 1809).

Example: 2007-03-01T13:00:00Z/2008-05-11T15:30:00Z (some time during the interval between 1 March 2007 1pm UTC and 11 May 2008 3:30pm UTC)

Counter examples:

2009-02-20T08:40Z (20 February 2009 8:40am UTC)

2018-08-29T15:19 (3:19pm local time on 29 August 2018)

These examples, where the eventDate is specified to one minute, are the only forms where "some time during the interval" is not included in the example.

ISO 8601-1, however, distinguishes between 1981 as a reduced precision date, and 1981-01-01/1981-12-31 as a time interval, and ISO 8601-2 adds what we realy need here with EDTF, where 1981-??-?? is explicitly some unknown time interval within 1981 where day and month are unknown. The expectation under ISO 8601-1 is that 2007-03-01T13:00:00Z/2008-05-11T15:30:00Z would represent the entire interval, not some time during the interval.

The phrasing in Darwin Core of "The date-time or interval during which an Event occurred." and the context of natural science collections data, makes me say that with dwc:eventDate, for any form of a range of dates or reduced date precision, we are dealing with time intervals where the event occurred at some unspecified time range within that interval. Collections data are frequently originally in the form "yyyy", but stored in one or more fields in a database of type date, where the database does not have an indeterminate date data type to distinguish between 1981, 1981-01-01/1981-12-31, and 1981-??-?? (which MUSE often did, representing dates as three fields, one for year, one for month, one for day, each able to take ** to represent, as EDTF in 8601-2 does, unknown values. How a database maps and serializes its internal dates into text strings for Darwin Core event terms is also highly likely to be variably interpreted (seeing the two of us, who have been working with these definitions and data for decades disagreeing...).

However, the definition of dwc:day is explicit: "The integer day of the month on which the Event occurred." This can't be the first day of a range, either a determinate interval over more than one day, or an intederminiate date range, either through reduced precision 8601-1 or explicit uncertainty of day in 8601-2.

So I would conclude that for #67, from the definition of dwc:day, we should not be filling ind dwc:day if the eventDate represents other than an interval of a single day or within a single day.

For any eventDate representing other than an event within a single day, we should be leaving dwc:day empty.

dwc:startDayOfYear/endDayOfYear arre less clear, they use the same "on which" phrasing as day, e.g. "The earliest integer day of the year on which the Event occurred", but the pair of them together imply a potential for a range, but from the language, a determinate range, the event had to occur on both the start date and the end date. So I would conclude that for startDayOfYear/endDayOfYear, we should only be populating these if dwc:eventDate contains an explicit range of precision down to at least day, but with the understanding that original data known to some time within a year may well be presented in a Darwin Core eventDate as either yyyy or yyyy-01-01/yyyy-12-31, and that an inference we make about reduced precision or an explicit range is made on very shaky ground.

The definition of dwc:day at least should force us to alter the specification for #67 and #52.

The definition of dwc:eventDate, and the absence, from the Darwin Core definitions, of the explicit serialization of uncertainty provided for by ISO 8601-2 EDTF, means that this change is likely isolated and doesn't mean we need to rexamine all of the TIME tests.

We probably do want to examine the specification and validation values for #86, as given ISO 8601-1, 1981 and 1981-01/1981-12 and 1981-01-01/1981-12-31 have different meanings, where 1981 is a reduced precision representation, and 1981-01-01/1981-12-31 represents an explicit time interval, and we don't want to transforming verbatim data frome one to the other.

Broader solution for this is to incorporate EDTF from ISO 8601-2 into the Darwin Core solutions, as we are very very frequently dealing with uncertain dates from records very similar to those that drove the library community to propose EDTF.

ymgan commented 1 year ago

Hmmm ... complicated ... Linking GBIF interpretation over if that helps: https://github.com/gbif/gbif-api/issues/4#issuecomment-1385497157 Discourse post here: https://discourse.gbif.org/t/gbif-api-supporting-ranges-in-occurrence-eventdate/3804

Thank you for your hard work \o/

chicoreus commented 1 year ago

Discussion in TG2 call, might need to either parameterize for looser/tighter specification, or split into one core and one non-core test.

Form 1: Data are not contradictory. eventDate:1894 year:1894 day:1 are not contradictory, even if not properly represented per darwin core.

Form 2: Data are not contradictory and are correctly represented. eventDate:1894 year:1894 day:1 are not contradictory, but day:1 should be empty.

Form 1 is important for research users of biodiversity data. Form 2 is important for users preparing data for aggregation.

ArthurChapman commented 1 year ago

Summary of part of the discussion on ZOOM

If precision is 1). > 1day, day, startDayOfYear, endDayOfYear are not filled in 2). >1 month, month and finer are not filled in, etc. 3). If there is a range (where precision is a day), then sdoy and edoy are filled in (even if range extend over > a year meaning edoy can be lower than sdoy

chicoreus commented 1 year ago

Starting from the current specification:

INTERNAL_PREREQUISITES_NOT_MET if dwc:eventDate is EMPTY, or all of dwc:year, dwc:month, dwc:day, dwc:startDayOfYear and dwc:endDayOfYear are EMPTY; COMPLIANT if all of the following conditions are met 1) the provided value of dwc:year matches the start year of the range represented by dwc:eventDate or dwc:year is empty, and 2) the provided value in dwc:month matches the start month of the range represented by dwc:eventDate or dwc:month is empty, and 3) the provided value in dwc:day matches the start day of the range represented by dwc:eventDate or dwc:day is empty, and 4) the provided value in dwc:startDayOfYear matches the start day of the year of the range represented by dwc:eventDate or dwc:startDayOfYear is empty, and 5) the provided value in dwc:endDayOfYear matches the end day of the year of the range represented by dwc:eventDate or dwc:endDayOfYear is empty; otherwise NOT_COMPLIANT.

Form 1 (consistent values) might look almost identical (changing case of some empty to EMPTY):

INTERNAL_PREREQUISITES_NOT_MET if dwc:eventDate is EMPTY, or all of dwc:year, dwc:month, dwc:day, dwc:startDayOfYear and dwc:endDayOfYear are EMPTY; COMPLIANT if all of the following conditions are met 1) the provided value of dwc:year matches the start year of the range represented by dwc:eventDate or dwc:year is EMPTY, and 2) the provided value in dwc:month matches the start month of the range represented by dwc:eventDate or dwc:month is EMPTY, and 3) the provided value in dwc:day matches the start day of the range represented by dwc:eventDate or dwc:day is EMPTY, and 4) the provided value in dwc:startDayOfYear matches the start day of the year of the range represented by dwc:eventDate or dwc:startDayOfYear is EMPTY, and 5) the provided value in dwc:endDayOfYear matches the end day of the year of the range represented by dwc:eventDate or dwc:endDayOfYear is EMPTY; otherwise NOT_COMPLIANT.

Form 2 (consistent and compliant) might look like:

INTERNAL_PREREQUISITES_NOT_MET if dwc:eventDate is EMPTY, or all of dwc:year, dwc:month, dwc:day, dwc:startDayOfYear and dwc:endDayOfYear are EMPTY; COMPLIANT if all of the following conditions are met 1) dwc:year is EMPTY or dwc:eventDate has a precision of one year or finer and the provided value of dwc:year matches the year expressed in dwc:eventDate, and 2) dwc:month is EMPTY or dwc:eventDate has a precision of one month or finer and the provided value in dwc:month matches the month represented by dwc:eventDate, and 3) dwc:day is EMPTY or dwc:eventDate has a precision of a day or less and the provided value in dwc:day matches the day represented by dwc:eventDate or dwc:day is empty, and 4) dwc:startDayOfYear is empty or: dwc:eventDate has a precision of one day or finer and the provided value in dwc:startDayOfYear matches the start day of the year of the range represented by dwc:eventDate, and 5) dwc:endDayOfYear is empty: or dwc:eventDate has a precision of one day or finer and the provided value in dwc:endDayOfYear matches the end day of the year of the range represented by dwc:eventDate; otherwise NOT_COMPLIANT.

Given the complexity, I suggest we split form 2 off as a separate (non-core) test rather than trying to parameterize.

I think form 2 above captures all the points in the discussion.

In #52, when we back populate year, month, day, startDayOfYear, endDayOfYear from eventDate, we must follow the rules in form 2, (also what @ArthurChapman is capturing in https://github.com/tdwg/bdq/issues/67#issuecomment-1588185053 )

Tasilee commented 1 year ago

Amended Expected response "1)" etc to "(1)" etc and updated Specification Last Updated.

chicoreus commented 1 year ago

Discussion in TG2 call 2023-06-19, we will go with just Form 2 (consistent and compliant), and add a note about compliance.

Tasilee commented 1 year ago

I have changed the Expected response due to what I perceive as a redundancy under (3) and a typo-

INTERNAL_PREREQUISITES_NOT_MET if dwc:eventDate is EMPTY, or all of dwc:year, dwc:month, dwc:day, dwc:startDayOfYear and dwc:endDayOfYear are EMPTY; COMPLIANT if all of the following conditions are met (1) dwc:year is EMPTY or dwc:eventDate has a precision of one year or finer and the provided value of dwc:year matches the year expressed in dwc:eventDate, and (2) dwc:month is EMPTY or dwc:eventDate has a precision of one month or finer and the provided value in dwc:month matches the month represented by dwc:eventDate, and (3) dwc:day is EMPTY or dwc:eventDate has a precision of a day or less and the provided value in dwc:day matches the day represented by dwc:eventDate or dwc:day is empty, and (4) dwc:startDayOfYear is empty or: dwc:eventDate has a precision of one day or finer and the provided value in dwc:startDayOfYear matches the start day of the year of the range represented by dwc:eventDate, and (5) dwc:endDayOfYear is empty: or dwc:eventDate has a precision of one day or finer and the provided value in dwc:endDayOfYear matches the end day of the year of the range represented by dwc:eventDate; otherwise NOT_COMPLIANT.

to

INTERNAL_PREREQUISITES_NOT_MET if dwc:eventDate is EMPTY, or all of dwc:year, dwc:month, dwc:day, dwc:startDayOfYear and dwc:endDayOfYear are EMPTY; COMPLIANT if all of the following conditions are met (1) dwc:year is EMPTY or dwc:eventDate has a precision of one year or finer and the provided value of dwc:year matches the year expressed in dwc:eventDate, and (2) dwc:month is EMPTY or dwc:eventDate has a precision of one month or finer and the provided value in dwc:month matches the month represented by dwc:eventDate, and (3) dwc:day is EMPTY or dwc:eventDate has a precision of a day or less and the provided value in dwc:day matches the day represented by dwc:eventDate, and (4) dwc:startDayOfYear is empty or: dwc:eventDate has a precision of one day or finer and the provided value in dwc:startDayOfYear matches the start day of the year of the range represented by dwc:eventDate, and (5) dwc:endDayOfYear is empty or dwc:eventDate has a precision of one day or finer and the provided value in dwc:endDayOfYear matches the end day of the year of the range represented by dwc:eventDate; otherwise NOT_COMPLIANT.

but I'd appreciate someone adding the consistency & compliance issue to the Notes.

ArthurChapman commented 1 year ago

Made minor correction in wording to remove the ":" after "or" in (4)

I think @chicoreus was going to word the Note.

chicoreus commented 1 year ago

Note updated. Added:

This test will only assert consistency if the data are both internally consistent and are compliant with the term definitions, for example dwc:day, by its definition, can only be the day of an dwc:eventDate that has a precision of a day or better and is not a range that spans more than a single day. A dwc:day that was internally consistent with the first day of the year (that is, 1) of an dwc:eventDate that only had precision to a year would be consistent internally, but not consistent with the Darwin Core term definitions, and would not return COMPLIANT from this test.

A rationale (that we implicitly touched on in the most recent TG2 call) for the choice of strict consistency and compliance is to ensure that downstream consumers of data and data quality reports are presented with Event data that is subject to misinterpretation. This does entangle their needs in Form 2, the strict consistent and compliant form.

chicoreus commented 1 year ago

All good to remove the NEEDS WORK tag?

chicoreus commented 1 year ago

Here are some data that should be Compliant:

eventDate startDayOfYear endDayOfYear year month day
1981-01-03 3 3 1981 1 3
1981-01 1981 1
1981 1981
1981-01/1981-12 1981
1981-01-01/1981-12-31 1 365 1981
1980/1981
1980-01-10/1980-01-15 10 15 1980 1
1981-12-30/1982-01-03 364 3

And some that should not (should return NOT_COMPLIANT), with non-compliant values in bold, and inconsistent values in italics.

eventDate startDayOfYear endDayOfYear year month day
1981-01 1 31 1981 1
1981-01 1981 2
1981 1 365 1981 1 1
1981-01/1981-12 1981
1981-01-01/1981-12-31 1 365 1981 1 1
1981-01-01/1981-12-31 1 366 1981
1980-01-10/1980-01-15 10 15 1980 1 10
1981-12-30/1982-01-03 364 3 1981
1981-12-30/1982-01-03 3 364

I think these are all as would be expected. @tucotuco can you check.

chicoreus commented 1 year ago

There is a clause in #52 that needs to be carried into this test (or removed from there, but I don't think that's our intent):

(3) dwc:year from dwc:eventDate if dwc:year is EMPTY and dwc:eventDate has a precision of a single year or finer and is within a single year

Contrast with the matching clause here:

(1) dwc:year is EMPTY or dwc:eventDate has a precision of one year or finer and the provided value of dwc:year matches the year expressed in dwc:eventDate, and

We aren't specifying here that the dwc:year should only be popluated if dwc:eventDate is within a single year, only that it has a precision of a year or finer. Thus given dwc:eventDate="1981/1982" or dwc:eventDate="1981-12-30/1982-01-03" #52 won't fill in dwc:year, but #67, as currently framed, will treat dwc:year=1982 as consistent.

I think, though I could be wrong, that #52 expresses our desire, as in the table above.

We also probably need to similarly be explicit about month and day. Start/End day of year are different, as we expect them to be poplulated if eventDate is a range, even if it spans more than one year, so long as there is a precision of one day or finer.

Perhaps change, adding and is within a single {duration} to the year, month, and day clauses:

INTERNAL_PREREQUISITES_NOT_MET if dwc:eventDate is EMPTY, or all of dwc:year, dwc:month, dwc:day, dwc:startDayOfYear and dwc:endDayOfYear are EMPTY; COMPLIANT if all of the following conditions are met (1) dwc:year is EMPTY or dwc:eventDate has a precision of one year or finer and and is within a single year and the provided value of dwc:year matches the year expressed in dwc:eventDate, and (2) dwc:month is EMPTY or dwc:eventDate has a precision of one month or finer and is within a single month and the provided value in dwc:month matches the month represented by dwc:eventDate, and (3) dwc:day is EMPTY or dwc:eventDate has a precision of a day or less and is within a single day and the provided value in dwc:day matches the day represented by dwc:eventDate, and (4) dwc:startDayOfYear is empty or dwc:eventDate has a precision of one day or finer and the provided value in dwc:startDayOfYear matches the start day of the year of the range represented by dwc:eventDate, and (5) dwc:endDayOfYear is empty or dwc:eventDate has a precision of one day or finer and the provided value in dwc:endDayOfYear matches the end day of the year of the range represented by dwc:eventDate; otherwise NOT_COMPLIANT.

We'll need to use similar language for month and day in #52 and will need to check #86, #93, and #132 for consistency, these 5 tests form a set that need to be consistent.

Amendments that fill eventDate from other terms:

86 AMENDMENT_EVENTDATE_FROM_VERBATIM

93 AMENDMENT_EVENTDATE_FROM_YEARMONTHDAY

132 AMENDMENT_EVENTDATE_FROM_YEARSTARTDAYOFYEARENDDAYOFYEAR

Amendment that backfills the other terms from eventDate:

52 AMENDMENT_EVENT_FROM_EVENTDATE

Test that the event terms are consistent:

67 VALIDATION_EVENT_CONSISTENT

chicoreus commented 1 year ago

Added #204 for us to easily see all of the specifications in one place.

tucotuco commented 1 year ago

I checked both the tables in https://github.com/tdwg/bdq/issues/67#issuecomment-1605232576 and am 100% in agreement with the assessment.

Tasilee commented 1 year ago

Splitting bdqffdq:Information Elements into "Information Elements ActedUpon" and "Information Elements Consulted": I am unsure about all Information Elements being ActedUpon. All are focused on int he Expected response.

Also changed "Field" to "TestField", "Output Type" to "TestType" and updated "Specification Last Updated"