Closed iDigBioBot closed 4 years ago
TestField | Value |
---|---|
GUID | 8ab38bee-323c-4926-a7e9-c0417cd3b14d |
Label | AMENDMENT_POLYNOMIAL_STANDARDIZED |
Description | Amend the scientific name to correct typographical errors and misspellings according to a specified source authority. |
TestType | Amendment |
Darwin Core Class | Taxon |
Information Elements ActedUpon | dwc:scientificName |
dwc:genericName | |
dwc:specificEpithet | |
dwc:infraSpecificEpithet | |
dwc:scientificNameAuthorship | |
dwc:yearOfPublication | |
Information Elements Consulted | |
Expected Response | EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if dwc:scientificName is bdq:Empty; AMENDED (dwc:scientificName, genus, specificEpithet, infraspecificEpithet, scientificNameAuthorship, yearOfPublication) if typographical errors and misspellings represented in dwc:scientificName have been unambiguously interpreted in the bdq:sourceAuthority; otherwise NOT_CHANGED |
Data Quality Dimension | Conformance |
Term-Actions | POLYNOMIAL_STANDARDIZED |
Parameter(s) | bdq:sourceAuthority |
Source Authority | [bdq:sourceAuthority ](bdq:sourceAuthority default = "GBIF Backbone Taxonomy" {[https://doi.org/10.15468/39omei]} {API endpoint [https://api.gbif.org/v1/species?datasetKey=d7dddbf4-2cf0-4f39-9b2a-bb099caae36c&name=]}) |
Specification Last Updated | 2024-04-16 |
Examples | [dwc:scientificName="Acacia longifloia" Response.status=AMENDED, Response.result=dwc:scientificName"Acacia longifolia" Response.comment="dwc:scientificName contains an interpretable value in the bdq:sourceAuthority"] |
[dwc:scientificName="Acacia camptophylla": Response.status=NOT_AMENDED, Response.result="", Response.comment="dwc:scientificName does not contain an interpretable value as there are a number of options in the bdq:sourceAuthority"] | |
Source | Tania Laity |
References |
|
Example Implementations (Mechanisms) | |
Link to Specification Source Code | |
Notes | [bdq:sourceAuthority default = GBIF Backbone Taxonomy]. (Currently found at: https://www.gbif.org/en/developer/species). The purpose of this Amendment is to correct errors in spelling and typography only. It is not intended to make changes of a taxonomic nature or to deal with errors or inconsistencies in the format of the Authorship. |
Comment by Paul Morris (@chicoreus) migrated from spreadsheet: The ability to assert a correction to a scientific name string is almost always restricted to proposed corrections to the authorship portion of the string. Much more effective to supply a link to a taxonID found in a nomenclator or taxonomic authority when an unambigouus match can be found than to attempt to alter the string value found in scientificName. An amendment affecting dwc:scientificNameAuthorship, on the other hand, is highly valuable, as the authorship string tend to be highly variable in construction.
See also #46 seems to be paired and have the same issues (should be AMENDMENT_SCIENTIFICNAME_STANDARDIZED?). See also: #101 which does seem a legitimate "polynomial" test.
I have changed the wording of the Notes
FROM: This test is not intended to make alterations of a taxonomic nature. The intent of this test is not to fix errors or inconsistencies in the format of the dwc:scientificNameAuthorship. For the purpose of this amendment, if the genus in the dwc:genus field does not match the genus of the polynomial, the genus of the polynomial takes precedence for standardization.
TO: The purpose of this Amendment is to correct errors in spelling and typography only. It is not intended to make changes of a taxonomic nature or to deal with errors or inconsistencies in the format of the Authorship. For the purpose of this amendment, if the genus in the dwc:genus field does not match the genus of the polynomial, the genus of the polynomial takes precedence for standardization.
@ArthurChapman improvement in expressing an intent, though a problematic one. Also, "Polynomial" is still problematic. there is no dwc:polynomial,. dwc:scientificName can contain either a uninomial or a polynomial, depending on the rank of the identification. A polynomial (with danger, as darwin core defines genus as the current classification of the scientific name, not the generic part of the dwc:scientificName) can be built from dwc;genus plus dwc;specificEpithent plus dwc:infraspecificEpithet if dwc:specificEpithet is populated, but the specification is mute about what is meant by polynomial in the notes, and the specification does not appear to include a need for terms other than dwc:scientificName, with according to the notes, some unspecified magic removing the authorship from consideration in that value.... The specification is currently mute on authorship, so an implementor's presumption would be that what is to be compared is the entire value found in the dwc:scientificName as compared with the best match in the specified source authority. If there is a desire to not include authorship, then there must be an unambigous specification as to how this is to be done (either with a (defined) parser, or removing the value found in dwc:scientificNameAuthorship from the end of the value found in dwc:scientificName, or by using a defined beginning of string only matching method on the source authority side). As currently phrased, the notes still represent magical thinking about the ability to detect which part of dwc:scientificName is the authorship and which parts are not for the wide range of names of all ranks, hybrids, and complex authorship strings under each of the codes, including the presence of initial capital letters in specific, subspecific, and infraspecific epithets in historical names, authorship strings embedded within name strings for hybrids and trinomials and quadranomials, and all sorts of interesting common cases.
After a fun discussion with @ArthurChapman, I think this boils down to how I responded to @chicoreus via email: POLYNOMIAL entails parsing on our end, but we assume parsing within the bdq:sourceAuthority as in the case of #57, don't we? My feeling is we remove #46 and #45 because @chicoreus informs us it is complex?
My point is we throw whatever is in dwc:scientificName at bdq:sourceAuthority with #57.
The original idea for Tests #45 and #46 was to fix minor spelling errors in the names (i.e. smithi versus smithii, litoralis versus littoralis etc.). This is something that CRIA does very well with its tests. There were other tests that involve the Taxon, TaxonID, and Scientific Name (+others). If we included Authorship and rank (var., ssp.) in these tests, then we are basically making these tests a duplication of other tests we already have (i.e. those dealing with combinations of TAXONID, TAXON and SCIENTIFICNAME). Given that, and the difficulty that @chicoreus mentions with parsing out the polynomial components from dwc:scientificName, etc., I see little value in continuing with these two tests (#45 and #46). I thus suggest that we simplify the process and change these two tests to SUPPLEMENTARY.
An alternative to moving this test to supplementary would be to specify an explicit means of handling the authorship in this test, for example:
change name from amendment polynomial standardized to amendment namestring standardized.
information elements: dwc:scientificName, dwc:scientificNameAuthorship
specification: EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority service was not available; INTERNAL_PREREQUISITES_NOT_MET if either dwc:scientificName or dwc:scientificNameAuthorship is EMPTY; AMENDED if the text string represented in dwc:scientificName with the text string present in dwc:scientificNameAuthorship removed from the end of is not a match for a scientific name string in the bdq:sourceAuthority and it can be unambiguously corrected to the name string of a known scientific name string consisting of the same number of words (here we could specify a maximum string distance for transformation) according to the bdq:sourceAuthority; otherwise NOT_CHANGED
A similar test with consideration of authorship could be included as supplemental.
In the notes, note that #70 identifies whether the specified source authority has an unambiguuous single record for the taxon, including the higher classification and authorship string, that #101 identifies inconsistencies between the scientific name and the atomic fields, and that #57 is the key amendment to propose a taxon id given the textual terms, including authorship.,
That might work @chicoreus - it still has the problem of rank (ie. straight trinomial, trinomial with var., ssp., subsp., forma, f., etc.)
I tend to agree with @ArthurChapman. Once we open the Pandora's Box of parsing dwc:scientificName, don't we need specific rules based upon a vocabulary that can assure us of a high probability of success? Flagging a potential issue as in the VALIDATION #46 is an equal challenge, but a safer test than this AMENDMENT.
We also have the following tests that seem to me to have similar problems (as noted by @chicoreus):
My inclination is to mirror the "GENUS_NOTFOUND", FAMILY_NOTFOUND", "ORDER_NOTFOUND", "CLASS_NOTFOUND", "KINGDOM_NOTFOUND" with "(VALIDATION)_SCIENTIFICNAME_NOTFOUND" by send whatever is in dwc:scientificName to the bdq:sourceAuthority and don't have an equivalent amendment. I understand that a) it depends on the smarts of the bdq:sourceAuthority (which has to increase quickly) and b) accepting we may get many false positives. But one of the criteria for accepting a high number of false positives is that it highlights a significant issue. I'd still get rid of #46 and #45.
I am in accord with the conclusions of Lee's final paragraph.
On Tue, Jul 14, 2020 at 8:59 PM Lee Belbin notifications@github.com wrote:
I tend to agree with @ArthurChapman https://github.com/ArthurChapman. Once we open the Pandora's Box of parsing dwc:scientificName, don't we need specific rules based upon a vocabulary that can assure us of a high probability of success? Flagging a potential issue as in the VALIDATION
46 https://github.com/tdwg/bdq/issues/46 is an equal challenge, but a
safer test than this AMENDMENT.
We also have the following tests that seem to me to have similar problems (as noted by @chicoreus https://github.com/chicoreus):
101 https://github.com/tdwg/bdq/issues/101: "COMPLIANT if the
polynomial, as represented in dwc:scientificName, is consistent with the atomic parts dwc:genus, dwc:specificEpithet, dwc:infraspecificEpithet;..."
46 https://github.com/tdwg/bdq/issues/46: "COMPLIANT if there are no
nomenclatural errors (e.g. typographical errors and misspellings) of a polynomial, as represented in dwc:scientificName according to the bdq:sourceAuthority service; ..."
70 https://github.com/tdwg/bdq/issues/70: " COMPLIANT if the
combination of values of dwc:Taxon terms (dwc:scientificName, dwc:scientificNameAuthorship, dwc:subgenus, dwc:genus, dwc:family, dwc:order, dwc:class, dwc:phylum, dwc:kingdom, dwc:taxonRank) can be unambiguously resolved by the specified source authority service; ..."
57 https://github.com/tdwg/bdq/issues/57: "AMENDED if a value for
dwc:taxonID is unique and resolvable on the basis of the value of the lowest ranking not EMPTY taxon classification terms dwc:scientificName, dwc:scientificNameAuthorship, dwc:kingdom, dwc:phylum, dwc:class, etc.; ..." (and I will change "etc" as this doesn't look good.
My inclination is to mirror the "GENUS_NOTFOUND", FAMILY_NOTFOUND", "ORDER_NOTFOUND", "CLASS_NOTFOUND", "KINGDOM_NOTFOUND" with "(VALIDATION)_SCIENTIFICNAME_NOTFOUND" by send whatever is in dwc:scientificName to the bdq:sourceAuthority and don't have an equivalent amendment. I understand that a) it depends on the smarts of the bdq:sourceAuthority (which has to increase quickly) and b) accepting we may get many false positives. But one of the criteria for accepting a high number of false positives is that it highlights a significant issue. I'd still get rid of #46 https://github.com/tdwg/bdq/issues/46 and #45 https://github.com/tdwg/bdq/issues/45.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/tdwg/bdq/issues/45#issuecomment-658469987, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADQ726PJDN4AXNI3RYXTEDR3TWONANCNFSM4EKSMKDQ .
I agreement with the quorum from the email responses on July 15, 2020, this amendment was considered too difficult to implement with confidence, for the present.
From the discussion, this is still immature and needs substantive further consideration. Removing from supplementary and tagging as immature.
Updated the markdown to reflect current practice, added a source authority in current form.
Since this was written, dwc:genericName has come into use, so replacing dwc:genus (the classification term) with dwc:genericName (the atomic generic part of the scientific name).
Additional terms (dwc:subgenus, dwc:infragenericEpithet, dwc:cultivarEpithet) might be appropriate to include as information elements acted upon.
One point for further consideration is if this test should operate on just dwc:scientificName, or if it should operate on that term and all the atomic component terms (dwc:genericName, dwc:specificEpithet, etc). This test might also consider dwc:scientificNameID as an information element consulted. Substantial thought and testing needed to bring this test to maturity.
@chicoreus - you missed adding "a source authority in current form."
@ArthurChapman fixed.
Examples edited to conform with current practice of providing both a pass and fail example.
Aligned parameters to current template
Fixed typos/errors in specifications to align with current template
Standardized reference to "EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available" in Expected Response.