tdwg / bdq

Biodiversity Data Quality (BDQ) Interest Group
https://github.com/tdwg/bdq
43 stars 7 forks source link

TG2-VALIDATION_GENUS_NOTEMPTY #214

Closed Tasilee closed 1 month ago

Tasilee commented 8 months ago
TestField Value
GUID d02c1ffd-af28-49bd-9c9c-e8e23a8b7258
Label VALIDATION_GENUS_NOTEMPTY
Description Is there a value in dwc:genus?
TestType Validation
Darwin Core Class dwc:Taxon
Information Elements ActedUpon dwc:genus
Information Elements Consulted dwc:taxonRank
Expected Response INTERNAL_PREREQUISITES_NOT_MET if dwc:genus is bdq:Empty and dwc:taxonRank contains a value that is not interpretable as a taxon rank; COMPLIANT if dwc:genus is bdq:NotEmpty, or dwc:genus is bdq:Empty and the value in dwc:taxonRank is higher than genus; otherwise NOT_COMPLIANT.
Data Quality Dimension Completeness
Term-Actions GENUS_NOTEMPTY
Parameter(s)
Source Authority
Specification Last Updated 2024-06-05
Examples [dwc:genus="genus": Response.status=RUN_HAS_RESULT, Response.result=COMPLIANT, Response.comment="dwc:genus is bdq:NotEmpty"]
[dwc:genus="": Response.status=RUN_HAS_RESULT, Response.result=NOT_COMPLIANT, Response.comment="dwc:genus is bdq:Empty"]
Source TG2
References
Example Implementations (Mechanisms)
Link to Specification Source Code
Notes Genus is expected to be bdq:Empty when an identification is only to the level of a taxon higher than Genus. This test is not regarded as CORE (cf. bdq:CORE).
chicoreus commented 8 months ago

All of #206, #207, #208. #213, #214, #215, #217, #218, #219, and #220 need to consider additional information about whether an identification is at a rank above that of the term under test and if so the term is correctly empty.

Suggest for all of these:

(1) add dwc:taxonRank as an information element consulted.

(2) rewrite the test specifications in the form:

COMPLIANT if the value in dwc:taxonRank is of a rank higher than genus or if dwc:genus is not EMPTY; otherwise NOT_COMPLIANT

Without such a change, these tests have limited power to identify data that has quality, with this being worse the lower the rank of the term under test is.

chicoreus commented 8 months ago

Removed #216 from the list, dwc:kingdom doesn't need examination of another term.

chicoreus commented 8 months ago

Noting that #265 is a similar, but more complex problem.

Tasilee commented 8 months ago

Changing the ER to

COMPLIANT if the value in dwc:taxonRank is of a rank higher than genus or if dwc:genus is not EMPTY; otherwise NOT_COMPLIANT

is similar to #256 in that IF dwc:taxonRank is 'higher than genus', then dwc:genus is effectively ignored as the Information Element Acted Upon, and you would be implying dwc:genus is not EMPTY, regardless.

If this test is considered useful in some context (use case), then I would suggest maybe (not sure this is right taxonomically)

POTENTIAL_ISSUE if dwc:genus is EMPTY and dwc:taxonRank is lower than "family"; otherwise NOT_ISSUE

?

ArthurChapman commented 8 months ago

@Tasilee - I would keep it Supplementary with your top wording. An ISSUE test would be a separate test and I don't think worth considering at this time.

ArthurChapman commented 6 months ago

@chicoreus is the wording given by @Tasilee for an ISSUE worth making a test for this?

chicoreus commented 6 months ago

@ArthurChapman given that there are darwin core terms for ranks lower than family and higher than genus, and taxonomic ranks that fall between the two, I think that, as phrased, @Tasilee 's issue would be difficult to phrase and implement.

Current phrasing looks good.

ArthurChapman commented 6 months ago

Made DO NOT IMPLEMENT as this test doesn't imply an aspect of Data Quality as it is redundant when compared with dwc:scientificName. It should probably be better testing dwc:genericName rather than dwc:genus. i.e. VALIDATION_GENERICNAME_NOTEMPTY.

chicoreus commented 6 months ago

DO NOT IMPLEMEMENT as this test is an artifact of our thinking prior to the formulation of dwc:genericName, its development came from dwc:genus being widely used (incorrectly) as a parse of the generic portion of scientific name, and thus it was being considered part of the suite of tests related to scientific name parsing. With the clearer separation between dwc:genus as part of the classification and dwc:genericName as a parse of the scientific name, this thinking is no longer relevant. So, the current classification at the generic level of an occurrence has very little utility for assessing data quality of the occurrence data. It may have some value in very narrow cases for evaluation of taxonomic data sets, but even then entries may be of higher taxa, and be expected to not have a classification at the level of genus, so without clear explication of a use case and the potential pitfalls of implementation, we are recommending this as DO NOT IMPLEMENT.

chicoreus commented 6 months ago

Our treating this as DO NOT IMPLEMEMENT was based on the incorrect belief that it stood in isolation. It is one of a family of supplemental tests that examine emptyness of higher classification terms. With the current inclusion of dwc:rank as an information element it is actually a good representative for that entire family. Should probably be considered supplemental, and the set of supplemental tests listed above brought into conformance with it

Tasilee commented 6 months ago

OK, but is dwc:genus a special case (as @ArthurChapman suggested in the Zoom 16th April 2024)?

chicoreus commented 6 months ago

On Mon, 15 Apr 2024 17:34:39 -0700 Lee Belbin @.***> wrote:

OK, but is dwc:genus a special case (as @ArthurChapman suggested in the Zoom 16th April 2024)?

It was, but it isn't anymore with the expression of dwc:genericName. This test is paralell to the other higher classification not empty supplementary tests. They can all (except the test for Kingdom), simply be evaluated by considering the rank and the presence or absence of a value. If the dwc:rank expresses a rank higher than the term under test, then the term is expected to be empty. This is how we had reformulated this test, but instead of extending the reformulation to the paralell set of tests we though it was a test in isolation and thus coming from its special case history.

We would be better off making this supplementary and reformulating all the other paralell higher classifcation term not empty tests to paralell the phrasing of the expected response in this one.

ArthurChapman commented 5 months ago

@Tasilee and I have been looking at this test and associated tests #206, #207, #208 which are all "FOUND" tests and #213, #215, #216, #217, #218, #219 and #220 which are all NOT EMPTY" tests. We think that this test should be kept simple like #213, etc. and be simple NOTEMPTY/EMPTY tests and that we shouldn't make them more complicated by adding in dwc:taxonRank as a consulted Element which would need altering Expected Response to include EXTERNAL_PREREQUISITES_NOTMET and INTERNAL_PREREQUISITES_NOTMET etc. and adding Source Authorities, etc. It would also mean having to alter all #213-#220 tests. They should remain SUPPLEMENTARY.

When thinking about how people may use these types of tests if they wish to implement them - most will just want to know if the field is EMPTY or not. If we make the tests much more complicated then people probably won't implement them. If we wanted to go more complicated, I think we would need to keep the "NOTEMPTY" tests, but add a new set of "FOUND" tests, something I think would be unnecessary use of our time at this stage.

With respect to this test - I think it should be altered to the same as #213, etc. and keep it simple and SUPPLEMENTARY

chicoreus commented 5 months ago

On Sun, 12 May 2024 16:18:35 -0700 Arthur Chapman @.***> wrote:

@Tasilee and I have been looking at this test and associated tests

206, #207, #208 which are all "FOUND" tests and #213, #215, #216,

217, #218, #219 and #220 which are all NOT EMPTY" tests. We think

that this test should be kept simple like #213, etc. and be simple NOTEMPTY/EMPTY tests and that we shouldn't make them more complicated by adding in dwc:taxonRank as a consulted Element

I disagree. If treated as simple Empty/Not Empty tests, these tests will assert that any data where the identificaiton is above the rank being examined will lack quality.

These tests need to inteligently check whether a value should be present before asserting that an empty state lacks quality. This means examining taxonRank.

ArthurChapman commented 5 months ago

Okay @chicoreus following your logic, does the following satisfy your requirements (of course we would need to add something in sourceAuthority).

EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if dwc:taxonRank is EMPTY or is at a higher rank than Genus; COMPLIANT if dwc:genus is not EMPTY; otherwise NOT_COMPLIANT

The wording now saying ..."COMPLIANT if the value in dwc:taxonRank is higher than genus or if dwc:genus is not EMPTY; otherwise NOT_COMPLIANT" is not logical - because it is saying that if taxonRank is Family and Genus is EMPTY - that the test is COMPLIANT although GENUS is EMPTY so makes no logical sense for a test for NOTEMPTY. The way I have suggested above makes logical sense.

Tasilee commented 5 months ago

@chicoreus ??

chicoreus commented 4 months ago

@ArthurChapman I think the family of NOT_EMPTY tests for higher taxon rank terms (https://github.com/tdwg/bdq/issues/213, https://github.com/tdwg/bdq/issues/215, https://github.com/tdwg/bdq/issues/216, https://github.com/tdwg/bdq/issues/217, https://github.com/tdwg/bdq/issues/218, https://github.com/tdwg/bdq/issues/219 and https://github.com/tdwg/bdq/issues/220 and this one, should follow the same pattern:

COMPLIANT if the value in dwc:taxonRank is higher than genus or if dwc:genus is not EMPTY; INTERNAL_PREREQUISITES_NOT_MET if dwc:genus is EMPTY, dwc:taxonRank is NOT_EMPTY, and dwc:taxonRank contains a value that is not interpretable as a taxon rank; otherwise NOT_COMPLIANT.

This asserts that the data have quality if dwc:genus contains a value, or if dwc:genus correctly does not contain a value, it handles a case were dwc:genus does not contain a value, and it isn't possible to tell if it should or not, and marks data where dwc:genus incorrectly lacks a value as not having quality.

I don't think a reference to a source authority is needed, as taxonRank can be assessed without reference to a source authority for the purposes of this test, if this isn't the case, then:

COMPLIANT if the value in dwc:taxonRank is higher than genus or if dwc:genus is not EMPTY; INTERNAL_PREREQUISITES_NOT_MET if dwc:genus is EMPTY, dwc:taxonRank is NOT_EMPTY, and dwc:taxonRank contains a value that is not interpretable as a taxon rank; EXTERNAL_PREREQUISITES_NOT_MET if dwc:genus does not contain a value, dwc:taxonRank contains a value and the sourceAuthority is needed and not available to interpret whether dwc:taxonRank has a rank higher than genus; otherwise NOT_COMPLIANT.

Key point is that data can have quality, and be COMPLIANT even if dwc:genus does not contain a value in those cases when dwc:genus should not contain a value. This isn't a simple family of tests for emptyness.

Tasilee commented 4 months ago

Thanks @chicoreus. I agree about Source Authority but I'm inclined to align with the structure we have been using and simplifying it -

INTERNAL_PREREQUISITES_NOT_MET if dwc:genus is EMPTY or dwc:taxonRank contains a value that is not interpretable as a taxon rank; COMPLIANT if the value in dwc:taxonRank is higher than genus or if dwc:genus is not EMPTY; otherwise NOT_COMPLIANT.

Is that ok?

Tasilee commented 4 months ago

Changed Expected Response from

COMPLIANT if the value in dwc:taxonRank is higher than genus or if dwc:genus is not EMPTY; otherwise NOT_COMPLIANT |

to

INTERNAL_PREREQUISITES_NOT_MET if dwc:genus is EMPTY or dwc:taxonRank contains a value that is not interpretable as a taxon rank; COMPLIANT if the value in dwc:taxonRank is higher than genus or if dwc:genus is not EMPTY; otherwise NOT_COMPLIANT.

chicoreus commented 4 months ago

On Thu, 30 May 2024 15:25:49 -0700 Lee Belbin @.***> wrote:

Thanks @chicoreus. I agree about Source Authority but I'm inclined to align with the structure we have been using and simplifying it -

INTERNAL_PREREQUISITES_NOT_MET if dwc:genus is EMPTY or dwc:taxonRank contains a value that is not interpretable as a taxon rank; COMPLIANT if the value in dwc:taxonRank is higher than genus or if dwc:genus is not EMPTY; otherwise NOT_COMPLIANT.

Is that ok?

That doesn't quite work. The first clause will incorrectly return INTERNAL_PREREQUISITES_NOT_MET when the taxonRank contains an uninterpretable value, even if the genus contains a value.

In that order of operations, needs an "and" in the first clause, not an "or".

INTERNAL_PREREQUISITES_NOT_MET if dwc:genus is EMPTY and dwc:taxonRank contains a value that is not interpretable as a taxon rank; COMPLIANT if the value in dwc:taxonRank is higher than genus or if dwc:genus is not EMPTY; otherwise NOT_COMPLIANT.

The second clause will return COMPLIANT if dwc:taxonRank contains a value higher than genus, regardless of the state of the genus, this is probably formally correct, but likely confusing.

Clearer statement is probably:

INTERNAL_PREREQUISITES_NOT_MET if dwc:genus is EMPTY and dwc:taxonRank contains a value that is not interpretable as a taxon rank; COMPLIANT if dwc:genus is not EMPTY, or dwc:genus is EMPTY and the value in dwc:taxonRank is higher than genus; otherwise NOT_COMPLIANT.

This phrasing more clearly expresses the intent.

ArthurChapman commented 4 months ago

That seems to work @chicoreus

Tasilee commented 4 months ago

OK, thanks @chicoreus. Changed Expected Response from

INTERNAL_PREREQUISITES_NOT_MET if dwc:genus is EMPTY or dwc:taxonRank contains a value that is not interpretable as a taxon rank; COMPLIANT if the value in dwc:taxonRank is higher than genus or if dwc:genus is not EMPTY; otherwise NOT_COMPLIANT.

to

INTERNAL_PREREQUISITES_NOT_MET if dwc:genus is EMPTY and dwc:taxonRank contains a value that is not interpretable as a taxon rank; COMPLIANT if dwc:genus is not EMPTY, or dwc:genus is EMPTY and the value in dwc:taxonRank is higher than genus; otherwise NOT_COMPLIANT.