tdwg / bdq

Biodiversity Data Quality (BDQ) Interest Group
https://github.com/tdwg/bdq
43 stars 7 forks source link

TG2-VALIDATION_YEAR_INRANGE #84

Open iDigBioBot opened 6 years ago

iDigBioBot commented 6 years ago
TestField Value
GUID ad0c8855-de69-4843-a80c-a5387d20fbc8
Label VALIDATION_YEAR_INRANGE
Description Is the value of dwc:year within the Parameter range?
TestType Validation
Darwin Core Class dwc:Event
Information Elements ActedUpon dwc:year
Information Elements Consulted
Expected Response INTERNAL_PREREQUISITES_NOT_MET if dwc:year is bdq:Empty or cannot be interpreted as an integer; COMPLIANT if the value of dwc:year is within the range bdq:earliestValidDate to bdq:latestValidDate inclusive; otherwise NOT_COMPLIANT
Data Quality Dimension Conformance
Term-Actions YEAR_INRANGE
Parameter(s) bdq:earliestValidDate
bdq:latestValidDate
Source Authority bdq:earliestValidDate="1582"
bdq:latestValidDate=current year
Specification Last Updated 2024-08-23
Examples [dwc:year="1952": Response.status=RUN_HAS_RESULT, Response.result=COMPLIANT, Response.comment="dwc:year is in RANGE"]
[dwc:year="9999": Response.status=RUN_HAS_RESULT, Response.result=NOT_COMPLIANT, Response.comment="dwc:year is not in RANGE. The value in year has not yet come to pass."]
Source VertNet
References
Example Implementations (Mechanisms) FilteredPush:event_date_qc
Link to Specification Source Code event_date_qc DwCEventDQ.validationYearInrange() unit test
Notes The results of this test are time-dependent. Next year is not valid now. Next year it will be. This test provides the option to designate lower and upper limits to the year. The upper limit, if not provided, should default to the year when the test is run. This test provides for a default earliest date (year), of 1582 by convention. That value was chosen because ISO 8601-1 asserts that "the use of proleptic Gregorian calendar dates prior are not allowed in ISO 8601-1 without prior agreement of the parties exchanging data", and Darwin Core provides no such prior agreement.
iDigBioBot commented 6 years ago

Comment by Paula Zermoglio (@pzermoglio) migrated from spreadsheet: If interpretation of eventDate is done before running this and previous test (eventDate in the future), then the previous one would suffice.

tucotuco commented 5 years ago

I propose to modify the expected output description. Instead of

"INTERNAL_PREREQUISITES_NOT_MET if the field dwc:year is not present, is EMPTY or is not a number; COMPLIANT if the value of the field dwc:year is between a designated minimum value and the current year inclusive; otherwise NOT_COMPLIANT"

I propose

"INTERNAL_PREREQUISITES_NOT_MET if dwc:year is not present, or is EMPTY or is not an integer; COMPLIANT if the value of dwc:year extends outside optionally-provided begin and end years; otherwise NOT_COMPLIANT"

I would also change the notes from

"The results of this test are time-dependent. A invalid date for next year will be valid next year. This test provides the option to designate a lower limit to the year, which for specimen records should be 1700 by convention."

to

"The results of this test are time-dependent. Next year is not valid now. Next year it will be. This test provides the option to designate lower and upper limits to the year. The upper limit, if not provided, should default to the year when the test is run. There should be no default lower limit. NB By convention, use 1700 as a lower limit for collecting dates of biological specimens."

tucotuco commented 5 years ago

Oops, "COMPLIANT if the value of dwc:year lies between optionally-provided begin and end years;"

tucotuco commented 5 years ago

Or better still, "COMPLIANT if the value of dwc:year does not extend outside optionally-provided begin and end years;"

tucotuco commented 5 years ago

I have taken the liberty to edit the Parameter(s) to be explicit about what the parameters are and what their default values are. Was, "Default values = 1600 and current year". Changed to, "Default values: earliest year = 1600, latest year = current year".

tucotuco commented 5 years ago

Following the discussions arising from the event date case study for the BISS paper, I believe that this test should be deprecated in favor of an updated TG2-VALIDATION_YEAR_NOTSTANDARD (https://github.com/tdwg/bdq/issues/141).

ArthurChapman commented 5 years ago

Agreed - fix wording in #141 and deprecate this one

Tasilee commented 5 years ago

I also agree to deprecate #84

tucotuco commented 5 years ago

Deprecated.

chicoreus commented 5 years ago

See comments in #141, this test is for a different concept that #141 and should not have been merged. Go back to the diagram of date tests we drew on the board in Gainesville. #141 paralells and amendment to convert unambigously interpretable text into years and should not be parameterized. #84 simply tests validly formatted integer years against range and should be parameterized.

chicoreus commented 5 years ago

The dimension for this test is not Conformance, but Likelyness - it is possible, but unlikely, that the date specified is outside the range specified. This is not a test of conformance to a standard (an integer), but a test of whether the event occurs within a likely range of years.

I suspect that the wrong test got closed, and we should use #84 instead of #141 (instead of merging the specification of #84 into #141 but retaining #141's now missleading name and dimension.

Tasilee commented 5 years ago

We have been inconsistent in regards the usage of NOTSTANDARD and OUTOFRANGE. We have used the former where we are matching a vocab of sorts. We have used the latter on numeric values. We have used both types for dwc:eventDate and dwc:dateIdentified - which is certainly fair. Considering what has been written against dwc:year (#141), dwc::month (#126) and dwc:day (#147), we have tried to cover both syntax and range issues under NOTSTANDARD.

Tasilee commented 5 years ago

@ArthurChapman and I have edited the Expected Response to simplify it and link to the Parameters. This is the plan for the remaining Tests.

tucotuco commented 2 years ago

I suggest a change in bdq:earliestDate="1600" to "1500" based on https://github.com/gbif/pipelines/issues/735#issuecomment-1157438243.

Tasilee commented 2 years ago

I agree. Done.

MattBlissett commented 2 years ago

Please note this issue: https://github.com/gbif/pipelines/issues/785

We may have records from before 1BC/BCE, and OBIS already have some records from before 1500: https://obis.org/occurrence/d03fdb62-4d02-4c98-81e9-6f77aaabd834

ArthurChapman commented 2 years ago

Thanks @MattBlissett In this case - these are rare and we would hope to flag them as we believe that that are probably more errors in these ranges than good values. As in most of these tests, we are not stating that it is wrong, but that it is something that needs checking. Also, in this case the actual dates are Paramaterized, so someone running the tests can set a Parameter Range. What we have here a default parameter (values) if those running the test don't set a Parameter. I think that 1500 makes a good lower value for the default.

tucotuco commented 2 years ago

Our test shows a default value of 1600. The 1500 is an example of a value that would be flagged.

As we move forward, Events will start to take on a much broader range of meaning than just Occurrences based on specimens and observations. The zooarchaeology community already struggled with eventDate as to whether it should contain the date the material was collected, or the date it was estimated to have been deposited. Both of these dates are going to become quite viable in the emerging data model, where the parametrization of this test will have greater importance and might end up being dependent on eventType (a term that doesn't exist yet in Darwin Core, but that has been proposed).

chicoreus commented 1 year ago

Pushing the default earliest date prior to 1582 raises a problem ( default bdq:earliestValidDate="1500" ) as without prior agreement, under ISO 8601-1, dates prior to the start of the Gregorian Calendar on 1582-11-15 are not valid. Thus dates in the range 1500-01-01/1582-11-14 could be reasonably expected by implementors to result in INTERNAL_PREREQUISITES_NOT_MET, as code evaluating them against ISO 8601-1 can plausibly assert that they are not validly formed ISO 8601-1 dates. The same concern applies to #36

ArthurChapman commented 1 year ago

I have updated the default earliest date to 1582-11-15 and added to the Notes "This test provides for a default earliest date, which is 1582-11-15 by convention. That date is supportable on the basis of ISO 8601-1 asserting that "the use of proleptic Gregorian calendar dates prior are not allowed in ISO 8601-1 without prior agreement of the parties exchanging data"

Tasilee commented 1 year ago

Restructured Parameter(s) and Source authority

ArthurChapman commented 1 year ago

Thumbs up if you agree to this change (NB - this brings the wording in line with #36).

Change Notes to

The results of this test are time-dependent: Next year is not valid now. Next year it will be. This test provides for a default earliest date, which is 1582-11-15 by convention. That date is supportable on the basis of ISO 8601-1 asserting that "the use of proleptic Gregorian calendar dates prior are not allowed in ISO 8601-1 without prior agreement of the parties exchanging data".

If setting a Parameter for this test be aware that prior to 1918, there may be issues associated with the use of the Gregorian calendar versus the Julian calendar in some countries. Difference between the Gregorian and Julian calendar has typically been around 10 days (but can be as great as 1 year and 10 days) see the comparison on https://en.wikipedia.org/wiki/Proleptic_Gregorian_calendar where "there is no difference in most of years 100 to 200... Also year 0 may or may not exist...". If your use requires knowledge of date to a precision of finer than one year and ten days, and you are not certain of the use of the Gregorian calendar, use 1919-01-01 as the earliestValidDate."

tucotuco commented 1 year ago

There seem to me to be several problems with this one. This is a test for year in range. Why are we providing a date for a parameter to test against rather than a year? Does the Julian/Gregorian question even need to enter into this test? It seems like it isn't relevant for this one.

chicoreus commented 1 year ago

On Wed, 14 Jun 2023 13:27:10 -0700 John Wieczorek @.***> wrote:

There seem to me to be several problems with this one. This is a test for year in range. Why are we providing a date for a parameter to test against rather than a year? Does the Julian/Gregorian question even need to enter into this test? It seems like it isn't relevant for this one.

I concur. Range boundaries here should be year only, not dates.

Also dwc:year doesn't invoke ISO 8601-1 in its definition, so it doesn't provide a constraint on what year we could use as the default parameter value. We could use 1582 for consistency with eventDate, and state why in the notes, or we could use an arbitararly earlier year.

tucotuco commented 1 year ago

I vote for 1582.

Tasilee commented 1 year ago

I agree. bdq:earlistValidDate="1582" seems ok or are we talking about using bdq:earliestValidYear and bdq:latestValidYear?

ArthurChapman commented 1 year ago

Changed Source Authority from "1582-11-15" to "1582"

ArthurChapman commented 1 year ago

Following discussion above - I suggest altering the Note from

The results of this test are time-dependent. Next year is not valid now. Next year it will be. This test provides the option to designate lower and upper limits to the year. The upper limit, if not provided, should default to the year when the test is run. This test provides for a default earliest date, which is 1582-11-15 by convention. That date is supportable on the basis of ISO 8601-1 asserting that "the use of proleptic Gregorian calendar dates prior are not allowed in ISO 8601-1 without prior agreement of the parties exchanging data".

To

The results of this test are time-dependent. Next year is not valid now. Next year it will be. This test provides the option to designate lower and upper limits to the year. The upper limit, if not provided, should default to the year when the test is run. This test provides for a default earliest date (year), of 1582 by convention. That date is supportable on the basis of ISO 8601-1 asserting that "the use of proleptic Gregorian calendar dates prior are not allowed in ISO 8601-1 without prior agreement of the parties exchanging data".

chicoreus commented 1 year ago

@ArthurChapman I suggest we change "supportable on", language that we are using in the discussion, to a conclusion "chosen because", and perhaps adding "and Darwin Core provides no such prior agreement." Thus suggest rephrasing to:

The results of this test are time-dependent. Next year is not valid now. Next year it will be. This test provides the option to designate lower and upper limits to the year. The upper limit, if not provided, should default to the year when the test is run. This test provides for a default earliest date (year), of 1582 by convention. That value was chosen because ISO 8601-1 asserts that "the use of proleptic Gregorian calendar dates prior are not allowed in ISO 8601-1 without prior agreement of the parties exchanging data", and Darwin Core provides no such prior agreement.

ArthurChapman commented 1 year ago

Agree @chicoreus and Note changed

ArthurChapman commented 1 year ago

I think all is now OK with this test so have removed the NEEDS WORK

ArthurChapman commented 1 year ago

@tasilee, @chicoreus - For consistency with other tests, should Expected Response be changed from

INTERNAL_PREREQUISITES_NOT_MET if dwc:year is not present, or is EMPTY or cannot be interpreted as an integer; COMPLIANT if the value of dwc:year is within the Parameter range; otherwise NOT_COMPLIANT

to

INTERNAL_PREREQUISITES_NOT_MET if dwc:year is not present, or is EMPTY or cannot be interpreted as an integer; COMPLIANT if the value of dwc:year is within the range bdq:earliestValidDate to bdq:latestValidDate inclusive; otherwise NOT_COMPLIANT

chicoreus commented 1 year ago

On Tue, 27 Jun 2023 16:02:51 -0700 Arthur Chapman @.***> wrote:

@tasilee, @chicoreus - For consistency with other tests, should Expected Response be changed from

INTERNAL_PREREQUISITES_NOT_MET if dwc:year is not present, or is EMPTY or cannot be interpreted as an integer; COMPLIANT if the value of dwc:year is within the Parameter range; otherwise NOT_COMPLIANT

to

INTERNAL_PREREQUISITES_NOT_MET if dwc:year is not present, or is EMPTY or cannot be interpreted as an integer; COMPLIANT if the value of dwc:year is within the range bdq:earliestValidDate to bdq:latestValidDate inclusive; otherwise NOT_COMPLIANT

Yes. We should be consistent.

ArthurChapman commented 1 year ago

OK - updated Expected Response and Specification Last Updated

Tasilee commented 1 year ago

Splitting bdqffdq:Information Elements into "Information Elements ActedUpon" and "Information Elements Consulted".

Also changed "Field" to "TestField", "Output Type" to "TestType" and updated "Specification Last Updated"

ArthurChapman commented 1 month ago

Updated Expected Response to delete "not present" (as redundant = EMPTY). Updated Specification Last Updated