tdwg / bdq

Biodiversity Data Quality (BDQ) Interest Group
https://github.com/tdwg/bdq
43 stars 7 forks source link

TG2-VALIDATION_EVENTDATE_INRANGE #36

Open iDigBioBot opened 6 years ago

iDigBioBot commented 6 years ago
TestField Value
GUID 3cff4dc4-72e9-4abe-9bf3-8a30f1618432
Label VALIDATION_EVENTDATE_INRANGE
Description Is the value of dwc:eventDate entirely with the Parameter Range?
TestType Validation
Darwin Core Class dwc:Event
Information Elements ActedUpon dwc:eventDate
Information Elements Consulted
Expected Response INTERNAL_PREREQUISITES_NOT_MET if dwc:eventDate is EMPTY or if the value of dwc:eventDate is not a valid ISO 8601-1 date; COMPLIANT if the range of dwc:eventDate is entirely within the range bdq:earliestValidDate to bdq:latestValidDate, inclusive, otherwise NOT_COMPLIANT
Data Quality Dimension Conformance
Term-Actions EVENTDATE_INRANGE
Parameter(s) bdq:earliestValidDate
bdq:latestValidDate
Source Authority bdq:earliestValidDate default ="1582-11-15"
bdq:latestValidDate default = current year
Specification Last Updated 2023-09-17
Examples [dwc:eventDate="1962-11-01T10:00-0600": Response.status=RUN_HAS_RESULT, Response.result=COMPLIANT, Response.comment="dwc:eventDate is IN_RANGE"]
[dwc:eventDate="2300-11-01T10:00": Response.status=RUN_HAS_RESULT, Response.result=NOT_COMPLIANT, Response.comment="dwc:eventDate is NOT_IN_RANGE"]
Source VertNet
References
Example Implementations (Mechanisms) Kurator:event_date_qc
Link to Specification Source Code FilteredPush event_date_qc DwCEventDQ.validationEventdateInrange()
Notes This test provides for a default earliest date, which is 1582-11-15 by convention. That date was chosen because ISO 8601-1 asserts that "the use of proleptic Gregorian calendar dates prior are not allowed in ISO 8601-1 without prior agreement of the parties exchanging data", and Darwin Core does not comment on this. Different calendars have been used at different times in different places, and the transcription of an original date in one calendar into dwc:eventDate, where a Gregorian Calendar is assumed, may or may not have been done with the correct translation of the date, and metadata may or not be present to even identify such records. Given the complexity, and ongoing nature of transitions between calendars, we do not advocate using this test for quality assurance by selecting a transition date and using it as a threshold.
ArthurChapman commented 1 year ago

I have updated the default earliest date to 1582-11-15 and added to the Notes "That date is supportable on the basis of ISO 8601-1 asserting that "the use of proleptic gregornian calendar dates prior are not allowed in ISO 8601-1 without prior agreement of the parties exchanging data."

ArthurChapman commented 1 year ago

Suggest we add to the notes (also in #84, #76):

If setting a Parameter for this test be aware that prior to 1918, there may be issues associated with the use of the Julian calendar versus the Gregorian calendar in some countries. Difference between the Gregorian and Julian calendar has typically been around 10 days, but see the comparison on https://en.wikipedia.org/wiki/Proleptic_Gregorian_calendar where "there is no difference in most of years 100 to 200... Also year 0 may or may not exist...". See also, the explanation on https://www.cree.name/genuki/dates.htm

Tasilee commented 1 year ago

Restructured Parameter(s) and Source authority.

ArthurChapman commented 1 year ago

Thumbs up if you agree to this change

Change Notes to

The results of this test are time-dependent: An invalid date for tomorrow will be valid tomorrow. This test provides for a default earliest date, which is 1582-11-15 by convention. That date is supportable on the basis of ISO 8601-1 asserting that "the use of proleptic Gregorian calendar dates prior are not allowed in ISO 8601-1 without prior agreement of the parties exchanging data".

If setting a Parameter for this test be aware that prior to 1918, there may be issues associated with the use of the Gregorian calendar versus the Julian calendar in some countries. Difference between the Gregorian and Julian calendar has typically been around 10 days (but can be as great as 1 year and 10 days) see the comparison on https://en.wikipedia.org/wiki/Proleptic_Gregorian_calendar where "there is no difference in most of years 100 to 200... Also year 0 may or may not exist...". If your use requires knowledge of date to a precision of finer than one year and ten days, and you are not certain of the use of the Gregorian calendar, use 1919-01-01 as the earliestValidDate.":

chicoreus commented 1 year ago

@ArthurChapman change: "That date is supportable on the basis of ISO 8601-1 asserting" to "That date was chosen because ISO 8601-1 asserts", and then add, ", and Darwin Core does not specify such." to the end of the sentence. The second paragraph needs some work too.

Suggest changing notes to:

The results of this test are time-dependent: An invalid date for tomorrow will be valid tomorrow. This test provides for a default earliest date, which is 1582-11-15 by convention. That date was chosen because ISO 8601-1 asserts that "the use of proleptic Gregorian calendar dates prior are not allowed in ISO 8601-1 without prior agreement of the parties exchanging data", and Darwin Core does not specify such.

If setting a Parameter for this test be aware that prior to about 1918 different countries and (researchers from those countries) switched from the Julian calendar to the Gregorian calendar versus the Julian calendar at different times. For example, Russia adopted the Gregorian Calendar on 1918-02-14, the British Empire in 1752-09-14, different regions in France between 1582 and 1760, with France also adopting the French Republican Calendar 1793-1805. The difference between the Gregorian and Julian calendar has typically been around 10 days. But, the day that is considered the first day of the year has also changed at different times in different countries, meaning that the difference can be as great as 1 year and 10 days. If your use requires knowledge of date to a precision of finer than one year and ten days, and you are not certain of the use of the Gregorian calendar, use 1923-03-01 (when Greece adopted the Gregorian Calendar) as the earliestValidDate.

chicoreus commented 1 year ago

We probably also need to add text to the notes on the order of "If temporal resolution of one year or better is important different calendars have been used at different times in different places, and the transcription of an original date in one calendar into dwc:eventDate, where a Gregorian Calendar is assumed, may or may not have been done with the correct translation of the date, and metadata may or not be present to identify such records."

tucotuco commented 1 year ago

I reiterate that I would not enumerate some transition dates while leaving out others. I would definitely not portray 1923 as if it was the latest transition. Transitions are still ongoing, and some may never happen. It would be discriminatory if any transition comes after any we chose and we can't have that. Better to cite Wikipedia and not give a cut-off date.

chicoreus commented 1 year ago

@tucotuco I agree, different uses are likely to have different needs. I would advocate listing a few dates, as examples, to remind people that this may be an important concern for dates present in historical biodiversity collections data, and that the absence of clear metadata about interpretations of those dates may make any quality assurance approach using this test as a threshold impractical.

chicoreus commented 1 year ago

How about:

The results of this test are time-dependent: An invalid date for tomorrow will be valid tomorrow. This test provides for a default earliest date, which is 1582-11-15 by convention. That date was chosen because ISO 8601-1 asserts that "the use of proleptic Gregorian calendar dates prior are not allowed in ISO 8601-1 without prior agreement of the parties exchanging data", and Darwin Core does not specify such.

If temporal resolution of one year or better is important different calendars have been used at different times in different places, and the transcription of an original date in one calendar into dwc:eventDate, where a Gregorian Calendar is assumed, may or may not have been done with the correct translation of the date, and metadata may or not be present to identify such records. Different countries and (researchers from those countries) have changed from the Julian calendar to the Gregorian calendar at different times. For example, Russia adopted the Gregorian Calendar on 1918-02-14, the British Empire in 1752-09-14, different regions in France between 1582 and 1760, with France also adopting the French Republican Calendar 1793-1805. The difference between the Gregorian and Julian calendar has typically been around 10 days. But, the day that is considered the first day of the year has also changed at different times in different countries, meaning that the difference can be as great as 1 year and 10 days. Given the complexity, and ongoing nature of transitions between calendars, we do not advocate using this test for quality assurance by simply selecting a transition date and using it as a threshold.

Tasilee commented 1 year ago

That looks useful, with a few minor edits and one query-

This test provides for a default earliest date, which is 1582-11-15 by convention. That date was chosen because ISO 8601-1 asserts that "the use of proleptic Gregorian calendar dates prior are not allowed in ISO 8601-1 without prior agreement of the parties exchanging data", and Darwin Core does not comment on this.

Different calendars have been used at different times in different places, and the transcription of an original date in one calendar into dwc:eventDate, where a Gregorian Calendar is assumed, may or may not have been done with the correct translation of the date, and metadata may or not be present to even identify such records. Countries and researchers have changed from the Julian calendar to the Gregorian calendar at different times. For example, Russia adopted the Gregorian Calendar on 1918-02-14, the British Empire in 1752-09-14, different regions in France between 1582 and 1760, with France also adopting the French Republican Calendar 1793-1805. The difference between the Gregorian and Julian calendar has typically been around 10 days. But, the day that is considered the first day of the year has also changed at different times in different countries, meaning that the difference can be as great as 1 year and 10 days. Given the complexity, and ongoing nature of transitions between calendars, we do not advocate using this test for quality assurance by selecting a transition date and using it as a threshold.

But that is what we are currently doing aren't we?

tucotuco commented 1 year ago

We aren't doing anything except providing the test. We aren't using the test for quality assurance. The user has to decide for what purpose it is appropriate to use the test. The bolded text is just guidance about that.

ArthurChapman commented 1 year ago

I am happy with the last version. After all, we are just checking if a date is in a range, and the Calendar dates are only an issue if one is setting a different date to the defaults. The majority of the tests will just test for the default, but if someone had a different start date (e.g.1900) then they just need to be aware of the issues and that is now covered in the notes. They could probably get around any problems in their parameter, by setting the date a year earlier (for bdq:earliestValidDate or a year later for bdq:latestValidDate).

chicoreus commented 1 year ago

Following up on https://github.com/tdwg/bdq/issues/36#issuecomment-1593877758 by @tucotuco inherent in the framework is that any test may be used for either quality control (finding data (or process improvements) that could be changed to improve the quality of data for some, in our case CORE use), or for quality assurance, filtering data in a MultiRecord to just those data that conform with the needs for some (in our case CORE) use. By using the framework, the tests are by design agnostic to their use.

It does still make some sense to provide some non-normative (notes) guidance for research users who might want to parameterize this test for quality assurance (that they will quickly get into the morass that we have in this discussion, and we advise looking at other approaches to meet their needs).

Tasilee commented 1 year ago

OK, how about this for Notes:

This test provides for a default earliest date, which is 1582-11-15 by convention. That date was chosen because ISO 8601-1 asserts that "the use of proleptic Gregorian calendar dates prior are not allowed in ISO 8601-1 without prior agreement of the parties exchanging data", and Darwin Core does not comment on this.

Different calendars have been used at different times in different places, and the transcription of an original date in one calendar into dwc:eventDate, where a Gregorian Calendar is assumed, may or may not have been done with the correct translation of the date, and metadata may or not be present to even identify such records. Given the complexity, and ongoing nature of transitions between calendars, we do not advocate using this test for quality assurance by selecting a transition date and using it as a threshold.

We place this text into the Standard document:

Different calendars have been used at different times in different places, and the transcription of an original date in one calendar into dwc:eventDate, where a Gregorian Calendar is assumed, may or may not have been done with the correct translation of the date, and metadata may or not be present to even identify such records.

Countries and researchers have changed from the Julian calendar to the Gregorian calendar at different times. For example, Russia adopted the Gregorian Calendar on 1918-02-14, the British Empire in 1752-09-14, different regions in France between 1582 and 1760, with France also adopting the French Republican Calendar 1793-1805. The difference between the Gregorian and Julian calendar has typically been around 10 days. But, the day that is considered the first day of the year has also changed at different times in different countries, meaning that the difference can be as great as 1 year and 10 days. Given the complexity, and ongoing nature of transitions between calendars, we do not advocate using this test for quality assurance by selecting a transition date and using it as a threshold.

chicoreus commented 1 year ago

We should note, and specify in the validation data, whether or not imprecice event dates that span the boundary should be considered compliant or not, that is, are eventDate = "1582", or eventDate = "1582-11" compliant or not (I suspect they are).

chicoreus commented 1 year ago

Missing a word: (I suspect they are not). They are reduced precision dates, so they aren't explicit about range, but they don't sound like they match the clause: "if the range of dwc:eventDate is entirely within the range bdq:earliestValidDate to bdq:latestValidDate, inclusive".

tucotuco commented 1 year ago

Agreed. Not.

On Fri, Jun 23, 2023, 17:07 Paul J. Morris @.***> wrote:

Missing a word: (I suspect they are not). They are reduced precision dates, so they aren't explicit about range, but they don't sound like they match the clause: "if the range of dwc:eventDate is entirely within the range bdq:earliestValidDate to bdq:latestValidDate, inclusive".

— Reply to this email directly, view it on GitHub https://github.com/tdwg/bdq/issues/36#issuecomment-1604883859, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADQ723U3OHHFG5JMXT3LYTXMXZQ5ANCNFSM4EKSMI5Q . You are receiving this because you were mentioned.Message ID: @.***>

Tasilee commented 11 months ago

Splitting bdqffdq:Information Elements into "Information Elements ActedUpon" and "Information Elements Consulted"