tdwg / bdq

Biodiversity Data Quality (BDQ) Interest Group
https://github.com/tdwg/bdq
42 stars 7 forks source link

TG2-AMENDMENT_OCCURRENCESTATUS_ASSUMEDDEFAULT #75

Open iDigBioBot opened 6 years ago

iDigBioBot commented 6 years ago
TestField Value
GUID 96667a0a-ae59-446a-bbb0-b7f2b0ca6cf5
Label AMENDMENT_OCCURRENCESTATUS_ASSUMEDDEFAULT
Description Proposes an amendment of the value of dwc:occurrenceStatus to the default parameter value if dwc:occurrenceStatus, dwc:individualCount and dwc:organismQuantity are empty.
TestType Amendment
Darwin Core Class dwc:Occurrence
Information Elements ActedUpon dwc:occurrenceStatus
Information Elements Consulted dwc:individualCount
dwc:organismQuantity
Expected Response INTERNAL_PREREQUISITES_NOT_MET if dwc:occurrenceStatus is bdq:NotEmpty; FILLED_IN the value of dwc:occurrenceStatus using the bdq:defaultOccurrenceStatus Parameter value if dwc:occurrenceStatus, dwc:individualCount and dwc:organismQuantity are bdq:Empty; otherwise NOT_AMENDED
Data Quality Dimension Completeness
Term-Actions OCCURRENCESTATUS_ASSUMEDDEFAULT
Parameter(s) bdq:defaultOccurrenceStatus
Source Authority bdq:defaultOccurrenceStatus default = "present"
Specification Last Updated 2024-11-13
Examples [dwc:occurrenceStatus="", dwc:individualCount="", dwc:organismQuantity="": Response.status=FILLED_IN, Response.result=dwc:occurrenceStatus="present", Response.comment="dwc:occurrenceStatus is bdq:Empty; assumed "Present""]
[dwc:occurrenceStatus="X", dwc:individualCount="10", dwc:organismQuantity="": Response.status=NOT_AMENDED, Response.result="", Response.comment="dwc:occurrenceStatus is bdq:NotEmpty"]
Source ALA
References
Example Implementations (Mechanisms)
Link to Specification Source Code
Notes There is currently a mismatch between https://dwc.tdwg.org/terms/#dwc:occurrenceStatus recommended values and the vocabulary at bdq:sourceAuthority that we are using (https://api.gbif.org/v1/vocabularies/OccurrenceStatus/concepts)
iDigBioBot commented 6 years ago

Comment by Paul Morris (@chicoreus) migrated from spreadsheet: Widespread assumption in vouchered occurrence data. Likely to be important when aggregating with any data with 'absent' values. However, this is an amendment: a value of "present" is being provided for dwc:occurrenceStatus when dwc:occurrenceStatus is either empty or not present.

ArthurChapman commented 6 years ago

Why could this not be incorporated under https://github.com/tdwg/bdq/issues/115 AMENDMENT_OCCURRENCESTATUS_STANDARDIZED?

Tasilee commented 6 years ago

That I would tend to agree with as we could easily add this special case. Comments from others?

Tasilee commented 6 years ago

In retrospect, if we are going to effectively treat EMPTY or an uninterpretable value as "present" then it is indeed an amendment. Sigh.

tucotuco commented 6 years ago

Agreed at TDWG 2018 DQIG meeting that this amendment can only be applied if the the value of dwc:occurrenceStatus is empty.

ArthurChapman commented 6 years ago

I wonder if we should change the name of this test to AMENDMENT_OCCURRENCESTATUS_ASSUMEDDEFAULT to parallel #102. Any comments? I realise the default has only the one possible value - i.e. "Present" but I am attempting to reduce the things we have to define.

Tasilee commented 6 years ago

After reviewing all, I'd agree. Changed accordingly

chicoreus commented 5 years ago

Inconsistency snuck in somewhere in the editing history, this is now clearly labeled as an amendment, but retains an output type of Notification, changing this to Amendment for consistency.

Tasilee commented 2 years ago

Changed "AMENDED" to "FILLED_IN" in accordance with discussions April 16. I also moved the INTERNAL_PREREQUISITES_NOT_MET test into the FILLED_IN part as this aligns with similar amendments.

Tasilee commented 1 year ago

Edited Example 2 as there is no "INTERNAL_PREREQUISITES_NOT_MET". There was an error in the test data, now fixed.

[dwc:occurrenceStatus="X": Response.status=NOT_AMENDED, Response.result="", Response.comment="dwc:occurrenceStatus is not EMPTY"]

ymgan commented 1 year ago

Hey, out of curiousity, may I know why the amendment of occurrenceStatus (to default = "present") does not consider the value of individualCount or organismQuantity and organismQuantityType?

I am thinking that it is possible to have situation where occurrenceStatus is empty but individualCount is >0 individualCount = 0. Please see:

Thank you!!

Edit: sorry, I noticed that I made a mistake in this comment

tucotuco commented 1 year ago

@ymgan I think you are absolutely right. The two terms individualCount and organismQuantity should be taken into account.

Tasilee commented 1 year ago

Surely it doesn't matter if dwc:occurrenceStatus is EMPTY and dwc:individualCount or dwc:organismQuantity or dwc:organismQuantityType have values? dwc:occurrenceStatus will still be set to "present', which is correct.

ArthurChapman commented 1 year ago

It does, because if there is already something in the field then you'd not do anything (interestingly if the dwc:occurrenceStatus says "absent" and you have something in the other fields, then there is a problem!)

This is possibly a test that needs revisiting and expanding - because if there is other stuff in that field then it probably needs to be AMENDED - e.g. if it has a count (5) then it probably should be changed to "present" etc. or do we have another test for that? - if we don't, perhaps we should - or modify this one.

chicoreus commented 1 year ago

On Sun, 26 Feb 2023 16:21:20 -0800 Arthur Chapman @.***> wrote:

This is possibly a test that needs revisiting and expanding - because if there is other stuff in that field then it probably needs to be AMENDED - e.g. if it has a count (5) then it probably should be changed to "present" etc. or do we have another test for that? - if we don't, perhaps we should - or modify this one.

Also need a validation to compare occurrenceStatus with dwc:individualCount and dwc:organismQuantity, and a separate amendment to amend occurrenceStatus from dwc:individualCount and dwc:organismQuantity.

Not sure to what extent AMENDMENT_OCCURRENCESTATUS_ASSUMEDDEFAULT should examine other fields, our usual pattern for assumeddefault does not entail other fields.

ArthurChapman commented 1 year ago

@chicoreus - the only reason here for the other fields as if is there nothing in those fields you cannot default to "present" because in that case it could be "absent"

ymgan commented 1 year ago

Thank you everyone! Apology as I realized I made a mistake in the comment which is now corrected. What I was referring to was scenario like this:

individualCount occurrenceStatus inferred occurrenceStatus flag
0 NULL ABSENT OCCURRENCE_STATUS_INFERRED_FROM_INDIVIDUAL_COUNT

It is from the comment in https://github.com/gbif/pipelines/issues/268#issuecomment-624755278 Under such condition, this test at its current state will amend occurrenceStatus to present, which is perhaps undesirable.

ArthurChapman commented 1 year ago

@Tasilee This looks like an issue that we need to put on agenda. As in @chicoreus comment - do we need new tests and from @ymgan comment - this one may not work as is.

tucotuco commented 1 year ago

Also need a validation to compare occurrenceStatus with dwc:individualCount and dwc:organismQuantity, and a separate amendment to amend occurrenceStatus from dwc:individualCount and dwc:organismQuantity. Not sure to what extent AMENDMENT_OCCURRENCESTATUS_ASSUMEDDEFAULT should examine other fields, our usual pattern for assumeddefault does not entail other fields.

I don't think an amendment can be made without considering the consistency of the rest of the terms that affect the assertion of absence of detection. The test as it stands is basically for the one case where all of those other fields are EMPTY.

ArthurChapman commented 1 year ago

Following up on @tucotuco comment - we have to reword the ER and add more Information Elements to either

  1. say "both dwc:individualCount and dwc:organismQuantity are EMPTY" or
  2. more complicated to take into account one or other of dwc:individualCount or dwc:organismQuantity having values - if both are EMPTY, or either has a value >0 then dwc:occurenceStatus is amended to "present", but if either dwc:individualCount or dwc:organismQuantity has a value of "0" then dwc:occurenceStatus is amended to "absent" BUT what if one of those = 0 and one is +ve ? would that make an INTERNAL_PREREQUISITES_NOT_MET?
chicoreus commented 1 year ago

On Sat, 11 Mar 2023 15:32:08 -0800 Arthur Chapman @.***> wrote:

  1. more complicated to take into account one or other of dwc:individualCount or dwc:organismQuantity having values - if both are EMPTY, or either has a value >0 then dwc:occurenceStatus is amended to "present", but if either dwc:individualCount or dwc:organismQuantity has a value of "0" then dwc:occurenceStatus is amended to "absent" BUT what if one of those = 0 and one is +ve ? would that make an INTERNAL_PREREQUISITES_NOT_MET?

This is phrasing a new and different amendment. Assumed default takes an empty value and stamps a default value into it. The most complex this can be and retain its intent is to check if occurrence status, individual count and organism status are all empty, and if so propose an amendment to occurrence status of present.

Any more complex logic, and we are taliking about a test in the form amendment_occurrencestatus_fromquantity or something like that.

Tasilee commented 1 year ago

I agree with @chicoreus about qualifying the current test to check if both dwc:individualCount or dwc:organismQuantity are EMPTY. From @tucotuco's comment https://github.com/tdwg/bdq/issues/75#issuecomment-1464802733, I assume we have agreement?

ArthurChapman commented 1 year ago

Agreed - this simplifies it. @ymgan does this satisfy your issues?

Tasilee commented 1 year ago

Amended the ER to

FILLED_IN the value of dwc:occurrenceStatus using the Parameter value if dwc:occurrence.Status, dwc:individualCount and dwc:organismQuantity are EMPTY; otherwise NOT_AMENDED

ymgan commented 1 year ago

Agreed - this simplifies it. @ymgan does this satisfy your issues?

Yes, thank you very much for your hard work here! I really appreciate it!

Tasilee commented 1 year ago

I have updated the Description and the Examples accordingly and will amend the test data.

Tasilee commented 1 year ago

I have added dwc:individualCount and dwc:organismQuantity to the Information Elements.

Tasilee commented 1 year ago

Restructured Parameter(s) and Source authority

ArthurChapman commented 1 year ago

Change sourceAuthority from "dwc:occurrenceStatus = "present"" to "dwc:occurrenceStatus default = "present""

Tasilee commented 1 year ago

Changed all Information Elements to "ActedUpon" as per Paul's Java Code.

@chicoreus: You will need to amend your code to include dwc:individualCount and dwc:organismQuantity ?

chicoreus commented 4 months ago

The parameter can't be the same as an information element.

Propose changing the parameter from dwc:occurrenceStatus to bdq:defaultOccurrenceStatus

Tasilee commented 4 months ago

Thanks @chicoreus - that seems a reasonable solution to me. Amending.

Tasilee commented 3 months ago

Changed Expected Response from

FILLED_IN the value of dwc:occurrenceStatus using the Parameter value if dwc:occurrence.Status, dwc:individualCount and dwc:organismQuantity are EMPTY; otherwise NOT_AMENDED

to

FILLED_IN the value of dwc:occurrenceStatus using the Parameter value if dwc:occurrenceStatus, dwc:individualCount and dwc:organismQuantity are EMPTY; otherwise NOT_AMENDED

ymgan commented 3 months ago

May I know if we need a VALIDATION_ORGANISMQUANTITY_NOTEMPTY please? We already have

ArthurChapman commented 3 months ago

Thanks @ymgan - #232 is Supplementary at this stage and another test for VALIDATION_ORGANISMQUANTITY_NOTEMPTY could be valuable for some, but at this stage we don't think it is widely applicable. But it is certainly one worth considering in the future if required. There are quite a few tests in a similar position that we don't believe are CORE.

ymgan commented 3 months ago

Thanks @ArthurChapman !! Good morning :D To make sure that I understand, even if this test is core and its prerequisite include individualCount and organismQuantity are empty, it does not mean we need the notempty tests for individualCount and organismQuantity. Am I correct?

chicoreus commented 3 months ago

On Mon, 19 Aug 2024 07:10:09 -0700 Yi-Ming Gan @.***> wrote:

Thanks @ArthurChapman !! Good morning :D To make sure that I understand, even if this test is core and its prerequisite include individualCount and organismQuantity are empty, it does not mean we need the notempty tests for individualCount and organismQuantity. Am I correct?

Non-empty tests for individualCount and organismQuantity would ba aspirational at this point.

If we adopted them we would be asserting that these terms would be important enough for everyone to try to put in the effort to populate them. For natural science collections data at least, this would be non-trivial, collections may know how many parts they have for some specimen, but not be readily able to work out how many individuals those represent.

So, yes, these do make natural related tests, but not really within the scope of what we want to accomplish. Others, for whom quality in this portion of the data, can easily propose a use case and suite of tests.

ymgan commented 3 months ago

got it, thanks @chicoreus !

chicoreus commented 2 weeks ago

This and #102 are "AssumedDefault" tests for which non-empty values aren't preventing execution of the test, that should probably both have the internal prerequisites clause removed and be able to reach the NOT_AMENDED clause:

FILLED_IN the value of dwc:occurrenceStatus using the Parameter value if dwc:occurrenceStatus, dwc:individualCount and dwc:organismQuantity are bdq:Empty; otherwise NOT_AMENDED

Tasilee commented 2 weeks ago

I agree with removing the INTERNAL_PREREQUISITES_NOT_MET phrase on this Test.

Tasilee commented 2 weeks ago

Do we change default to "Present" for now as "present" won't currently validate against the GBIF vocabulary?

chicoreus commented 2 weeks ago

On Mon, 11 Nov 2024 18:48:40 -0800 Lee Belbin @.***> wrote:

Do we change default to "Present" for now as "present" won't currently validate against the GBIF vocabulary?

Yes. Good catch. Probably worth adding a note as well.

Tasilee commented 2 weeks ago

Added to Notes " There is currently a mismatch between https://dwc.tdwg.org/terms/#dwc:occurrenceStatus recommended values and the vocabulary at bdq:sourceAuthority that we are using (https://api.gbif.org/v1/vocabularies/OccurrenceStatus/concepts)"

chicoreus commented 2 weeks ago

Corrected the parameter namespace for bdq:defaultOccurrenceStatus from dwc to bdq.