tdwg / bdq

Biodiversity Data Quality (BDQ) Interest Group
https://github.com/tdwg/bdq
43 stars 7 forks source link

TG2-AMENDMENT_IDENTIFICATIONQUALIFIER_FROM_TAXON #106

Closed iDigBioBot closed 4 years ago

iDigBioBot commented 6 years ago
TestField Value
GUID 65c5595b-6229-4f89-98e9-7a62dbda492d
Label AMENDMENT_IDENTIFICATIONQUALIFIER_FROM_TAXON
Description Can an identification qualifier be extracted from related taxon terms?
TestType Amendment
Darwin Core Class Taxon, Identification
Information Elements ActedUpon dwc:identificationQualifier
Information Elements Consulted dwc:scientificName
dwc:specificEpithet
dwc:infraspecificEpithet
Expected Response EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES NOT_MET if all of the taxon name fields are bdq:Empty or the field dwc:identificationQualifier is bdq:NotEmpty; AMENDED if the field dwc:identificationQualifier as in the bdq:sourceAuthority is FILLED_IN from any of the fields dwc:scientificName, dwc:specificEpithet or dwc:infraspecificEpithet; otherwise NOT_AMENDED
Data Quality Dimension Completeness
Term-Actions IDENTIFICATIONQUALIFIER_FROM_TAXON
Parameter(s) bdq:sourceAuthority
Source Authority bdq:sourceAuthrity default = "Darwin Core Identification Qualifier" {[https://dwc.tdwg.org/list/#identificationQualifier]} {dwc:identificationQualitifer vocabulary API [NO CURRENT API EXISTS]}
Specification Last Updated 2024-09-18
Examples [dwc:scientificName="Quercus aff. agrifolia var. oxyadenia", dwc:identificationQualifier="": Response.status=AMENDED, Response.result=dwc:identificationQualifier="aff. agrifolia var. oxyadenia", Response.comment="dwc:scientificName contains an interpretable dwc:identificationQualifier"]
[dwc:scientificName="Quercus", dwc:identificationQualifier="": Response.status=NOT_AMENDED, Response.result="", Response.comment="dwc:scientificName does not contain an interpretable dwc:identificationQualifier"]
Source VertNet
References
Example Implementations (Mechanisms)
Link to Specification Source Code
Notes dwc:genus is not included as an Information Element because if a "?" is present only in dwc:genus but not in dwc:scientificName, then by the Darwin Core definition of genus, this implies an uncertainty about placement in the classification rather than uncertainty about the identification (determination). We use a vocabulary to detect an identificationQualifier as a token, but the resulting dwc:identificationQualifier itself need not necessarily follow a controlled vocabulary.
iDigBioBot commented 6 years ago

Comment by John Wieczorek (@tucotuco) migrated from spreadsheet: Name fields would be replaced with amended names and identification qualifier(s) put in identificationQualifier.

iDigBioBot commented 6 years ago

Comment by Arthur Chapman (@ArthurChapman) migrated from spreadsheet: Should follow IDENTIFIER_QUALIFIER_DETECTED

iDigBioBot commented 6 years ago

Comment by John Wieczorek (@tucotuco) migrated from spreadsheet: Can use a vocabulary to detect identificationQualifier as a token, but the resulting identificationQualifier need not necessarily follow a controlled vocabulary. For examples, see the description for identificationQualifier, where the names are included as well.

tucotuco commented 5 years ago

The notes could use some cleanup here (full sentences) for clarity.

Tasilee commented 5 years ago

Is better now?

tucotuco commented 5 years ago

Happy face applied.

ianengelbrecht commented 5 years ago

Same note here about question marks in genus as for VALIDATION_IDENTIFICATIONQUALIFIER_DETECTED. More importantly though, the internal prerequisite is that dwc:identificationQualifier is empty. What happens if it's not, and we find a qualifier in one of the taxon name fields that does not match? For example might be dwc:scientificName="Quercus aff. agrifolia var. oxyadenia" and dwc:identificationQualifier = "cf.". Not sure if this would ever happen in real examples though.

ianengelbrecht commented 5 years ago

Oh, and does the ammendment require that the qualifier be removed from the taxon name field it was found in too?

ArthurChapman commented 5 years ago

@ianengelbrecht Good questions. @chicoreus do we need to discuss this one further iun the light of these questions?

chicoreus commented 5 years ago

The intent of the test for identificationQualifier not empty is to prevent this test from suggesting a change to an existing value, internal prerequisites not met isn't the right response for that case, I don't think.

Instead of:

EXTERNAL_PREREQUISITES_NOT_MET if the specified source authority service was not available; INTERNAL _PREREQUISITES NOT_MET if all of the taxon name fields were EMPTY or the field dwc:identificationQualifier was not EMPTY; AMENDED if the field dwc:identificationQualifier was FILLED_IN from any of the fields dwc:scientificName, dwc:specificEpithet or dwc:infraspecificEpithet; otherwise NOT_CHANGED

We should have:

EXTERNAL_PREREQUISITES_NOT_MET if the specified source authority service was not available; INTERNAL _PREREQUISITES NOT_MET if all of the taxon name fields were EMPTY; AMENDED if the field dwc:identificationQualifier was FILLED_IN from any of the fields dwc:scientificName, dwc:specificEpithet or dwc:infraspecificEpithet; NOT_CHANGED if the the field dwc:identificationQualifier was not EMPTY; otherwise NOT_CHANGED

Tasilee commented 5 years ago

@chicoreus: Hmm, ok. I can live with that. Other comments before I race off to edit?

ianengelbrecht commented 5 years ago

Just the question about what to do with the field that the qualifier comes from - must it be removed there as well?

Also in implementing this is code I realised there could be edge cases which are difficult to deal with. Eg if scientificName is Quercus cf/nr alba, which qualifier do we pick? There may also be the case where scientificName is Quercus cf alba and specificEpithet is nr alba. I’m not sure if this could ever plausibly arise in practice though, it really is an extreme edge case.

On Sun, 18 Aug 2019 at 23:35, Lee Belbin notifications@github.com wrote:

@chicoreus https://github.com/chicoreus: Hmm, ok. I can live with that. Other comments before I race off to edit?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tdwg/bdq/issues/106?email_source=notifications&email_token=ACH3QI7MZAPUTEGCXXQVSLTQFG6CHA5CNFSM4EKSRYHKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4RIXXI#issuecomment-522357725, or mute the thread https://github.com/notifications/unsubscribe-auth/ACH3QIZK4SB55DBMY3QPWODQFG6CHANCNFSM4EKSRYHA .

-- Ian Engelbrecht PhD Pr. Nat. Sci. Data Coordinator: Natural Science Collections Facility South African National Biodiversity Institute Pretoria www.nscf.co.za www.sanbi.org 012 843 5194 082 763 4596 i.engelbrecht@sanbi.org.za / ianicus.za@gmail.com

ArthurChapman commented 4 years ago

I can't see any easy way of writing this into the Expected Response. We could if needed but we can do it by just changing the Notes:

The AMENDMENT is made by finding the qualifier as a token within dwc:scientificName; if the first encountered match is inside the string, then place text from the qualifier to the end of the string in dwc:identificationQualifier, if the qualifier is first encountered at the end of the string, place the entire string in dwc:identificationQualifier.

Note that dwc:genus is not included as an Information Element because if a "?" is present only in dwc:genus but not in dwc:scientificName, then by the Darwin Core definition of genus, this implies an uncertainty about placement in the classification rather than uncertainty about the identification (determination). We use a vocabulary to detect an identificationQualifier as a token, but the resulting dwc:identificationQualifier itself need not necessarily follow a controlled vocabulary.

chicoreus commented 4 years ago

For a small vocabulary, (?, cf. nr.), this is probably tractable, but in the general case, we probably can't tell all possible other text from a qualifier.

ArthurChapman commented 4 years ago

Due to the complications in implementing this test and #97, I vote that we move them to Supplemental. I still believe that they are valuable tests, but then there are a lot of tests within the Supplemental tests that I would hope would be implemented a later date. But for now, I think the difficulties in implementing these two tests make them impractical at this time. The only alternative I see would be for a modification of #97 that just flags any record that has a qualifier - in any of the taxonomic fields +dwc:identification qualifier.

Tasilee commented 4 years ago

On the basis of @pzermoglio 's research which indicates more than a thousand identification qualifier variants, AMENDMENTs based on their detection is fraught with issues. I'd suggest we set this to NOT CORE and hope that Paula's work will elevate the issues and that results in a solution (but I am not holding my breath).

Tasilee commented 4 years ago

I vote to not include this AMENDMENT as CORE

tucotuco commented 4 years ago

I concur.

On Mon, May 25, 2020 at 3:42 AM Lee Belbin notifications@github.com wrote:

I vote to not include this AMENDMENT as CORE

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tdwg/bdq/issues/106#issuecomment-633406271, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADQ7277U2V4W25MNPXIOK3RTIHMDANCNFSM4EKSRYHA .

chicoreus commented 4 years ago

@Tasilee I concur. A basic implementation using a small vocabulary would not gain much and would leave many false negatives. An effective implementation would be or need to use a very high quality name parser, and would still (given the list of values in the wild) be problematic in interpretation.

Tasilee commented 4 years ago

Thanks @chicoreus - concisely put

chicoreus commented 8 months ago

Updated format of markdown table to match current usage.

Tasilee commented 8 months ago

Updated examples to align with current template.

Tasilee commented 8 months ago

Added Description to align with current template

Tasilee commented 8 months ago

Changed Field to TestField

Tasilee commented 6 months ago

Standardized reference to "EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available" in Expected Response.