Closed iDigBioBot closed 4 years ago
TestField | Value |
---|---|
GUID | 65c5595b-6229-4f89-98e9-7a62dbda492d |
Label | AMENDMENT_IDENTIFICATIONQUALIFIER_FROM_TAXON |
Description | Can an identification qualifier be extracted from related taxon terms? |
TestType | Amendment |
Darwin Core Class | Taxon, Identification |
Information Elements ActedUpon | dwc:identificationQualifier |
Information Elements Consulted | dwc:scientificName |
dwc:specificEpithet | |
dwc:infraspecificEpithet | |
Expected Response | EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES NOT_MET if all of the taxon name fields are bdq:Empty or the field dwc:identificationQualifier is bdq:NotEmpty; AMENDED if the field dwc:identificationQualifier as in the bdq:sourceAuthority is FILLED_IN from any of the fields dwc:scientificName, dwc:specificEpithet or dwc:infraspecificEpithet; otherwise NOT_AMENDED |
Data Quality Dimension | Completeness |
Term-Actions | IDENTIFICATIONQUALIFIER_FROM_TAXON |
Parameter(s) | bdq:sourceAuthority |
Source Authority | bdq:sourceAuthrity default = "Darwin Core Identification Qualifier" {[https://dwc.tdwg.org/list/#identificationQualifier]} {dwc:identificationQualitifer vocabulary API [NO CURRENT API EXISTS]} |
Specification Last Updated | 2024-09-18 |
Examples | [dwc:scientificName="Quercus aff. agrifolia var. oxyadenia", dwc:identificationQualifier="": Response.status=AMENDED, Response.result=dwc:identificationQualifier="aff. agrifolia var. oxyadenia", Response.comment="dwc:scientificName contains an interpretable dwc:identificationQualifier"] |
[dwc:scientificName="Quercus", dwc:identificationQualifier="": Response.status=NOT_AMENDED, Response.result="", Response.comment="dwc:scientificName does not contain an interpretable dwc:identificationQualifier"] | |
Source | VertNet |
References |
|
Example Implementations (Mechanisms) | |
Link to Specification Source Code | |
Notes | dwc:genus is not included as an Information Element because if a "?" is present only in dwc:genus but not in dwc:scientificName, then by the Darwin Core definition of genus, this implies an uncertainty about placement in the classification rather than uncertainty about the identification (determination). We use a vocabulary to detect an identificationQualifier as a token, but the resulting dwc:identificationQualifier itself need not necessarily follow a controlled vocabulary. |
Comment by John Wieczorek (@tucotuco) migrated from spreadsheet: Name fields would be replaced with amended names and identification qualifier(s) put in identificationQualifier.
Comment by Arthur Chapman (@ArthurChapman) migrated from spreadsheet: Should follow IDENTIFIER_QUALIFIER_DETECTED
Comment by John Wieczorek (@tucotuco) migrated from spreadsheet: Can use a vocabulary to detect identificationQualifier as a token, but the resulting identificationQualifier need not necessarily follow a controlled vocabulary. For examples, see the description for identificationQualifier, where the names are included as well.
The notes could use some cleanup here (full sentences) for clarity.
Is better now?
Happy face applied.
Same note here about question marks in genus as for VALIDATION_IDENTIFICATIONQUALIFIER_DETECTED. More importantly though, the internal prerequisite is that dwc:identificationQualifier is empty. What happens if it's not, and we find a qualifier in one of the taxon name fields that does not match? For example might be dwc:scientificName="Quercus aff. agrifolia var. oxyadenia" and dwc:identificationQualifier = "cf.". Not sure if this would ever happen in real examples though.
Oh, and does the ammendment require that the qualifier be removed from the taxon name field it was found in too?
@ianengelbrecht Good questions. @chicoreus do we need to discuss this one further iun the light of these questions?
The intent of the test for identificationQualifier not empty is to prevent this test from suggesting a change to an existing value, internal prerequisites not met isn't the right response for that case, I don't think.
Instead of:
EXTERNAL_PREREQUISITES_NOT_MET if the specified source authority service was not available; INTERNAL _PREREQUISITES NOT_MET if all of the taxon name fields were EMPTY or the field dwc:identificationQualifier was not EMPTY; AMENDED if the field dwc:identificationQualifier was FILLED_IN from any of the fields dwc:scientificName, dwc:specificEpithet or dwc:infraspecificEpithet; otherwise NOT_CHANGED
We should have:
EXTERNAL_PREREQUISITES_NOT_MET if the specified source authority service was not available; INTERNAL _PREREQUISITES NOT_MET if all of the taxon name fields were EMPTY; AMENDED if the field dwc:identificationQualifier was FILLED_IN from any of the fields dwc:scientificName, dwc:specificEpithet or dwc:infraspecificEpithet; NOT_CHANGED if the the field dwc:identificationQualifier was not EMPTY; otherwise NOT_CHANGED
@chicoreus: Hmm, ok. I can live with that. Other comments before I race off to edit?
Just the question about what to do with the field that the qualifier comes from - must it be removed there as well?
Also in implementing this is code I realised there could be edge cases which are difficult to deal with. Eg if scientificName is Quercus cf/nr alba, which qualifier do we pick? There may also be the case where scientificName is Quercus cf alba and specificEpithet is nr alba. I’m not sure if this could ever plausibly arise in practice though, it really is an extreme edge case.
On Sun, 18 Aug 2019 at 23:35, Lee Belbin notifications@github.com wrote:
@chicoreus https://github.com/chicoreus: Hmm, ok. I can live with that. Other comments before I race off to edit?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tdwg/bdq/issues/106?email_source=notifications&email_token=ACH3QI7MZAPUTEGCXXQVSLTQFG6CHA5CNFSM4EKSRYHKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4RIXXI#issuecomment-522357725, or mute the thread https://github.com/notifications/unsubscribe-auth/ACH3QIZK4SB55DBMY3QPWODQFG6CHANCNFSM4EKSRYHA .
-- Ian Engelbrecht PhD Pr. Nat. Sci. Data Coordinator: Natural Science Collections Facility South African National Biodiversity Institute Pretoria www.nscf.co.za www.sanbi.org 012 843 5194 082 763 4596 i.engelbrecht@sanbi.org.za / ianicus.za@gmail.com
I can't see any easy way of writing this into the Expected Response. We could if needed but we can do it by just changing the Notes:
The AMENDMENT is made by finding the qualifier as a token within dwc:scientificName; if the first encountered match is inside the string, then place text from the qualifier to the end of the string in dwc:identificationQualifier, if the qualifier is first encountered at the end of the string, place the entire string in dwc:identificationQualifier.
Note that dwc:genus is not included as an Information Element because if a "?" is present only in dwc:genus but not in dwc:scientificName, then by the Darwin Core definition of genus, this implies an uncertainty about placement in the classification rather than uncertainty about the identification (determination). We use a vocabulary to detect an identificationQualifier as a token, but the resulting dwc:identificationQualifier itself need not necessarily follow a controlled vocabulary.
For a small vocabulary, (?, cf. nr.), this is probably tractable, but in the general case, we probably can't tell all possible other text from a qualifier.
Due to the complications in implementing this test and #97, I vote that we move them to Supplemental. I still believe that they are valuable tests, but then there are a lot of tests within the Supplemental tests that I would hope would be implemented a later date. But for now, I think the difficulties in implementing these two tests make them impractical at this time. The only alternative I see would be for a modification of #97 that just flags any record that has a qualifier - in any of the taxonomic fields +dwc:identification qualifier.
On the basis of @pzermoglio 's research which indicates more than a thousand identification qualifier variants, AMENDMENTs based on their detection is fraught with issues. I'd suggest we set this to NOT CORE and hope that Paula's work will elevate the issues and that results in a solution (but I am not holding my breath).
I vote to not include this AMENDMENT as CORE
I concur.
On Mon, May 25, 2020 at 3:42 AM Lee Belbin notifications@github.com wrote:
I vote to not include this AMENDMENT as CORE
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tdwg/bdq/issues/106#issuecomment-633406271, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADQ7277U2V4W25MNPXIOK3RTIHMDANCNFSM4EKSRYHA .
@Tasilee I concur. A basic implementation using a small vocabulary would not gain much and would leave many false negatives. An effective implementation would be or need to use a very high quality name parser, and would still (given the list of values in the wild) be problematic in interpretation.
Thanks @chicoreus - concisely put
Updated format of markdown table to match current usage.
Updated examples to align with current template.
Added Description to align with current template
Changed Field to TestField
Standardized reference to "EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available" in Expected Response.