tdwg / bdq

Biodiversity Data Quality (BDQ) Interest Group
https://github.com/tdwg/bdq
43 stars 7 forks source link

TG2-AMENDMENT_SCIENTIFICNAMEID_FROM_TAXON #57

Open iDigBioBot opened 6 years ago

iDigBioBot commented 6 years ago
TestField Value
GUID 431467d6-9b4b-48fa-a197-cd5379f5e889
Label AMENDMENT_SCIENTIFICNAMEID_FROM_TAXON
Description Proposes an amendment to the value of dwc:scientificNameID if it can be unambiguously resolved from bdq:sourceAuthority using the available taxon terms.
TestType Amendment
Darwin Core Class dwc:Taxon
Information Elements ActedUpon dwc:scientificNameID
Information Elements Consulted dwc:taxonID
dwc:acceptedNameUsageID
dwc:originalNameUsageID
dwc:taxonConceptID
dwc:scientificName
dwc:higherClassification
dwc:kingdom
dwc:phylum
dwc:class
dwc:order
dwc:superfamily
dwc:family
dwc:subfamily
dwc:tribe
dwc:subtribe
dwc:genus
dwc:genericName
dwc:subgenus
dwc:specificEpithet
dwc:infraspecificEpithet
dwc:cultivarEpithet
dwc:vernacularName
dwc:scientificNameAuthorship
dwc:taxonRank
Expected Response EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if dwc:scientificNameID is bdq:NotEmpty, or if all of dwc:scientificName, dwc:genericName, dwc:specificEpithet, dwc:infraspecificEpithet, dwc:scientificNameAuthorship, and dwc:cultivarEpithet are bdq:Empty, FILLED_IN the value of dwc:scientificNameID for an unambiguously resolved single taxon record in the bdq:sourceAuthority through (1) the value of dwc:scientificName or (2) if dwc:scientificName is bdq:Empty through values of the terms dwc:genericName, dwc:specificEpithet, dwc:infraspecificEpithet, dwc:scientificNameAuthorship and dwc:cultivarEpithet, or (3) if ambiguity produced by multiple matches in (1) or (2) can be disambiguated to a single Taxon using the values of dwc:subtribe, dwc:tribe, dwc:subgenus, dwc:genus, dwc:subfamily, dwc:family, dwc:superfamily, dwc:order, dwc:class, dwc:phylum, dwc:kingdom, dwc:higherClassification, dwc:taxonID, dwc:acceptedNameUsageID, dwc:originalNameUsageID, dwc:taxonConceptID, dwc:taxonomicRank, and dwc:vernacularName; otherwise NOT_AMENDED
Data Quality Dimension Conformance
Term-Actions TAXONID_FROM_TAXON
Parameter(s) bdq:sourceAuthority
Source Authority bdq:sourceAuthority default = "GBIF Backbone Taxonomy" {[https://doi.org/10.15468/39omei]} {API endpoint [https://api.gbif.org/v1/species?datasetKey=d7dddbf4-2cf0-4f39-9b2a-bb099caae36c&name=]}
Specification Last Updated 2023-09-17
Examples [dwc:taxonID="", dwc:scientificNameID="", dwc:acceptedNameUsageID="", dwc:originalNameUsageID="", dwc:taxonConceptID="", dwc:scientificName="Chicoreus palmarosae (Lamarck, 1822)", dwc:higherClassification="", dwc:kingdom="Animalia", dwc:phylum="Mollusca", dwc:class="Gastropoda", dwc:order="", dwc:family="Muricidae", dwc:subfamily="", dwc:genus="Chicoreus", dwc:genericName="Chicoreus", dwc:subgenus="", dwc:infragenericEpithet="", dwc:specificEpithet="palmarosae", dwc:infraspecificEpithet="", dwc:cultivarEpithet="", dwc:vernacularName="", dwc:scientificNameAuthorship="(Lamarck, 1822)", dwc:taxonRank="", bdq:sourceAuthority=”marinespecies.org”: Response.status=FILLED_IN, Response.result=dwc:scientificNameID="urn:lsid:marinespecies.org:taxname:208134", Response.comment="dwc:scientificName matched to unique taxon record in WoRMS, exact match on name and authorship. Resolvable at https://marinespecies.org/aphia.php?p=taxdetails&id=208134"]
[dwc:scientificNameID="", dwc:taxonID="", dwc:acceptedNameUsageID="", dwc:originalNameUsageID="", dwc:taxonConceptID="", dwc:scientificName="Graphis", dwc:higherClassification="", dwc:kingdom="", dwc:phylum="", dwc:class="", dwc:order="", dwc:family="", dwc:subfamily="", dwc:genus="", dwc:genericName="", dwc:subgenus="", dwc:infragenericEpithet="", dwc:specificEpithet="", dwc:infraspecificEpithet="", dwc:cultivarEpithet="", dwc:vernacularName="", dwc:scientificNameAuthorship="", dwc:taxonRank="": Response.status=NOT_AMENDED, Response.result=, Response.comment="dwc:scientificName="Graphis" is ambiguous as could be either a lichen or a gastropod."]
Source FP-Akka
References
Example Implementations (Mechanisms) Kurator/FilteredPush sci_name_qc Library, FP-KurationServices, Arctos, MCZbase, Symbiota
Link to Specification Source Code https://github.com/FilteredPush/sci_name_qc/blob/v1.1.2/src/main/java/org/filteredpush/qc/sciname/DwCSciNameDQ.java#L397 https://github.com/FilteredPush/sci_name_qc/blob/v1.1.2/src/main/java/org/filteredpush/qc/sciname/DwCSciNameDQ.java#L476
Notes Return a result with no value and a Result.status of NOT_AMENDED with a Response.comment of ambiguous if the information provided does not resolve to a unique result (e.g. if homonyms exist and there is insufficient information in the provided data, for example using the lowest ranking taxa in conjunction with dwc:dwc:scientificNameAuthorship, to resolve them). When referencing a GBIF taxon by GBIF's identifier for that taxon, use the the pseudo-namespace "gbif:" and the form "gbif:{integer}" as the value for dwc:scientificNameID.
iDigBioBot commented 6 years ago

Comment by Paul Morris (@chicoreus) migrated from spreadsheet: Moving from scientificName as a string to a link to a guid in a taxonomic or nomenclatural authority is key for moving towards linked open data and other semantic delivery of biodiversity data. There is almost never enough data in flat Darwin Core to fill in any of the other ID terms in the Taxon class, but it is often possible to link scientific name strings to nomenclatural or taxonomic records.

godfoder commented 6 years ago

We should add taxonRank to the list of fields for this and #70 . It is especially important for the interpretation of monomials in scientific name absent other supporting data.

chicoreus commented 6 years ago

@godfoder I concur. Do we need to specify a more complex set of prerequisites?

chicoreus commented 2 years ago

A couple of issues for implementation:

Acton to take when taxonID is NOT_EMPTY: The specification is mute on what action to take when dwc:taxonID has a value. Since other tests specify CHANGED only if term that is proposed to be amended is NOT_EMPTY, the implication is that an amendment is to be proposed, for purposes such as conforming taxonID values to a national authority. This should probably be spelled out in the notes section.

Extraneous terms in the list of Information Elements: The specification states that a proposed amendment should be based on "on the basis of the value of the lowest ranking not EMPTY taxon classification terms dwc:scientificName, dwc:scientificNameAuthorship, dwc:kingdom, dwc:phylum, dwc:class, etc.", with @godfoder's comment clearly indicating that taxonRank should be included in this list. The notes imply that none of the other ID terms (dwc:scientificNameID, dwc:acceptedNameUsageID, dwc:originalNameUsageID, dwc:taxonConceptID) should be included in this analysis, so it seems that they shouldn't be included in the informationElements, unless there is a clear specification of how to include them to infer a value of taxonID. Also, neither dwc:higherClassification nor dwc:vernacularName are included in the specification, and thus don't seem to fit in the list of information elements.

ArthurChapman commented 2 years ago

Further to the logic of @chicoreus - I don't understand the inclusion of dwc:scientificNameAuthorship as it isn't a taxon classification term in the hierarchy, and that and that field alone could not supply a taxonID.

chicoreus commented 2 years ago

@ArthurChapman I see scientificNameAuthorship as an essential term for identifying which taxonID to use, it can often disambuguate homonyms and if the authorship string associated with the source record for taxonID isn't the same as the authorship string in a record under consideration, then something likely isn't correct and an assertion of of a taxonID match is not a good one to make.

ArthurChapman commented 2 years ago

@chicoreus - that is correct, but it may need us to reword the test, because as written, I don't see how that field could work as dwc:scientificNameAuthorship is not strictly a classification term. dwc:scientificName should include the authorship and thus could be used to resolve the taxonID. It probably applies to a different test to use the dwc:scientificNameAuthorship to fix dwc:scientificName but as this test is written then the dwc:scientificNameAuthorship on its own doesn't work. I don't see that term belong in this test.

chicoreus commented 2 years ago

@ArthurChapman Yes, rewording would be good. Point well taken that the information should be in scientificName and scientificNameAuthorship should be a parse of that rather than a classification term. Pragmatically, scientificNameAuthorship makes for easier removal parsing of the canonical name and authorship part of scientificName when it contains both, (and often, despite the definitions, it doesn't), and services tend to return better results when queried just on canonical name and then have the results examined for similarity of authorship strings. A huge amount of the variability in the wild is in the authorship strings, people's names abbreviated or not, punctuation variability, presence and absence of prefixes, suffixes, and honorifics, and in animal names, the presence or absence of years, etc. Implementation logic needs to deal with this in a consistent way, not farming it off to what may or may not be returned from a particular services when given a value found in dwc:scientificName.

Tasilee commented 2 years ago

I defer to those far more expert on names to reword the Expected Response and tune the Information Elements accordingly. I do however offer two comments on general issues relating to this AMENDMENT-

While the precursor VALIDATION #105 tests for dwc:taxonID EMPTY, as I remember it, the 'tests' should be 'stand alone' so we should be explicit here.

I dislike the use of "etc" in the current Expected Response as these provide explicit rules for implementation. If we need to refer to a dwc term, then we MUST specify it.

Tasilee commented 2 years ago

I still greatly dislike the "etc" in the Expected Response. Related: I also don't like Information Elements that are missing from the Expected Response. We must to provide concise unambiguous instructions on this test (and in the test data where I revisited this), and as I am unsure how for example dwc:infraspecificEpithet comes into this...I'll leave to the NAME gurus.

ArthurChapman commented 2 years ago

This is a very difficult one. Suggested change from (may need tweaking):

EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority was not available; INTERNAL_PREREQUISITES_NOT_MET if all of dwc:kingdom, dwc:phylum, dwc:class, dwc:order, dwc:family, dwc:genus, and dwc:scientificName are EMPTY; AMENDED if a value for dwc:taxonID is unique and resolvable on the basis of the value of the lowest ranking not EMPTY taxon classification terms dwc:scientificName, dwc:scientificNameAuthorship, dwc:kingdom, dwc:phylum, dwc:class, etc.; otherwise NOT_AMENDED

to

EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority was not available; INTERNAL_PREREQUISITES_NOT_MET if all of dwc:kingdom, dwc:phylum, dwc:class, dwc:order, dwc:family, dwc:genus, dwc:genericName, dwc:specificEpithet and dwc:scientificName are EMPTY or if dwc:taxonId is not EMPTY; AMENDED if a value for dwc:taxonID is unique and resolvable on the basis of the value of dwc:scientificName, dwc:acceptedNameUsageID, or dwc:origionalNameUsageID, or an unambiguous combination of any of the lowest ranking not EMPTY taxon classification terms dwc:infraspecificEpithet, dwc:cultivarEpithet, dwc:specificEpithet, dwc:infragenericEpithet, dwc:genus, dwc:genericName, dwc:subfamily, dwc:family, dwc:class, dwc:order, dwc:phylum, dwc:kingdom, dwc:higherClassification, dwc:vernacularName, in conjunction with dwc:scientificNameAuthorship and dwc:taxonRank; otherwise NOT_AMENDED

Some of my thinking: 1). in the INTERNAL_PREREQUISITES_NOT_MET - dwc:subfamily dwc:subgenericEpithet, dwc:subfamily, dwc:infraspecificEpithet, etc. cannot on their own give you the TAXONID. With dwc:infraspecificEpithet, for example - you'd need to have at least the dwc:species.

  1. I have added dwc:genericName and dwc:specificEpithet the former is new - the latter was missing in previous version.
  2. I have added dwc:taxonId is not EMPTY (it was missing in the previous version - but you aren't going to change it if it already has a TAXONID. 4). The taxonID could be obtained from dwc:scientificName, dwc:acceptedNameUsageID, or dwc:origionalNameUsageID without any other information (they all include the authorship information, etc.). All the others require a combination of two or more terms and in conjunction with dwc:scientificNameAuthorship if there are homonyms to separate them. 5). I am not sure where dwc:taxonRank should fit within that formula. I don't think it is correct where it is, and it is not a "classification term" so doesn't belong with the others.

Good luck with getting your heads around this one.

@Tasilee - if accepted it will need some work for the examples.

ArthurChapman commented 2 years ago

@chicoreus @tucotuco - does dwc:taxonConceptId come into this anywhere - i.e. to help distinguish the Taxon ID - it would possibly by beside dwc:scientificNameAuthorship and dwc:taxonRank in the "conjunction with" area. I'd prefer to leave it out.

Tasilee commented 2 years ago

Thanks @ArthurChapman. My eyes glaze over when it comes to names. I have enough understanding to be dangerous. I therefore defer to @ArthurChapman, @chicoreus, @tucotuco and hopefully a few more watching on for advice, or thumbs up etc.

As it stands, your Expected Response is at least explicit. My only quibble may be the use of "in conjunction with ...". How?

ArthurChapman commented 2 years ago

@Tasilee. We suggested "in conjunction with" in the ZOOM call. It means that if there is a homonym (possibly in different Kingdoms, or even within a genus) - then you would need "the lowest ranking taxon term" in conjunction with "dwc:scientificNameAuthorship" to separate the homonyms to get a TAXONID. This is not the case with dwc:scientificName for example, as this contains the Authorship by definition.

Tasilee commented 2 years ago

This would at least need to be in the Notes then?

ArthurChapman commented 2 years ago

@Tasilee - sounds reasonable - I suggest we will do Notes after we agree on the wording of the Expected Response.

Tasilee commented 2 years ago

In going through the data, there is an anomaly that is fixed if we accept the last suggested Expected Response from @chicoreus. I have just moved the "if dwc:taxonID is not EMPTY" to the start of the INTERNAL_PREREQUISITES_NOT_MET list as it makes parsing simpler.

ArthurChapman commented 2 years ago

@Tasilee I can't see the suggested Expected Response from @chicoreus to which you refer. You seem to have added my suggestion - but I am still not sure about the placement of dwc.taxonRank in the Reponse,

dwc:subfamily, dwc:genericName, dwc:infragenericEpithet and dwc:cultivarEpithet need to be added to the Information Elements and we should add something to the Notes as suggested by you above. Suggest instead of

[bdq:sourceAuthority default = GBIF Backbone Taxonomy]. (Currently found at: https://www.gbif.org/en/developer/species). This is the taxonID inferred from the Darwin Core Taxon class, not from any other sense of Taxon. Return a result with no value and a result state of ambiguous if the information provided does not resolve to a unique result (e.g. if homonyms exist and there is insufficient information in the provided data to resolve them)

we use

[bdq:sourceAuthority default = GBIF Backbone Taxonomy]. (Currently found at: https://www.gbif.org/en/developer/species). This is the taxonID inferred from the Darwin Core Taxon class, not from any other sense of Taxon. Return a result with no value and a result state of ambiguous if the information provided does not resolve to a unique result (e.g. if homonyms exist and there is insufficient information in the provided data, for example using the lowest ranking taxa in conjunction with dwc:dwc:scientificNameAuthorship, to resolve them).

chicoreus commented 2 years ago

@ArthurChapman for "Return a result with no value and a result state of ambiguous", we've moved away from using ambiguous as a Response.status, so this would be Response.status of NOT_AMENDED and ambiguous in the Response.comment or the proposed Response.qualifier extension (this makes a good example of a Response.qualifier=AMBIGUOUS being a good structured qualifier for the Response, with more details in the Response.comment). (Also noting, we seem to be settling on Response.result instead of Response.value or result value)

@ArthurChapman I'm not seeing it either (checked recent email threads), @Tasilee was the latest expected response from me something you wrote down in the last call?

ArthurChapman commented 2 years ago

OK - I will make a change in the Notes to:

[bdq:sourceAuthority default = GBIF Backbone Taxonomy]. (Currently found at: https://www.gbif.org/en/developer/species). This is the taxonID inferred from the Darwin Core Taxon class, not from any other sense of Taxon. Return a result with no value and a Result.status of NOT_AMENDED with a Response.comment of ambiguous if the information provided does not resolve to a unique result (e.g. if homonyms exist and there is insufficient information in the provided data, for example using the lowest ranking taxa in conjunction with dwc:dwc:scientificNameAuthorship, to resolve them).

@chicoreus - the Expected Response to which @tasilee was referring was apparently mine - not yours (I just spoke to him by phone).

chicoreus commented 2 years ago

Aligining with #70, specification could read:

INTERNAL_PREREQUISITES_NOT_MET dwc:taxonID is not EMPTY or if all of, dwc:scientificName, dwc:genericName, dwc:specificEpithet, dwc:infraspecificEpithet, dwc:taxonRank, dwc:scientificNameAuthorship, dwc:cultivarEpithet are EMPTY, AMENDED to a the value taxonID for an unambiguously resolved single taxon record in the specified source authority service through (1) if the value of dwc:scientificName and dwc:cultivarEpithet, or (2) or if dwc:scientificName is EMPTY through values of the terms dwc:genericName, dwc:specificEpithet, dwc:infraspecificEpithet, dwc:taxonRank, dwc:scientificNameAuthorship (and if not EMPTY, dwc:cultivarEpithet), or (2) if ambiguity produced by multiple matches in (1) or (2) can be disabmiguated to a single Taxon using the values of dwc:subgenus, dwc:genus, dwc:family, dwc:order, dwc:class, dwc:phylum, dwc:kingdom, dwc:higherClassification, dwc:scientificNameID, dwc:acceptedNameUsageID, dwc:originalNameUsageID, dwc:taxonConceptID, and dwc:vernacularName); otherwise NOT_AMENDED

chicoreus commented 2 years ago

Key piece (from #70) to add to the notes: The terms dwc:subgenus, dwc:genus, dwc:family, dwc:order, dwc:class, dwc:phylum, dwc:kingdom, dwc:higherClassification, dwc:scientificNameID,, dwc:acceptedNameUsageID, dwc:originalNameUsageID, dwc:taxonConceptID should not be used to make a match if dwc:taxonId and dwc;scientificName or dwc:genericName, dwc:specificEpithet, dwc:infraspecificEpithet, dwc:taxonRank, dwc:scientificNameAuthorship are empty.

This expresses the assertion that if only dwc:genus is populated, the taxonID for that genus should not be filled in as the amendment, as the dwc:genus (and dwc:family and up) is a classification term for the Taxon, not necessarily a constituent part of the name of the Taxon.

ArthurChapman commented 2 years ago

I don't think this is correct and I find this wording totally confusing

  1. "or if all of, dwc:scientificName, dwc:genericName, dwc:specificEpithet, dwc:infraspecificEpithet, dwc:taxonRank, dwc:scientificNameAuthorship, dwc:cultivarEpithet are EMPTY" - if you only have dwc:infraspecificEpithet, dwc:taxonRank, dwc:scientificNameAuthorship or dwc:cultivarEpithet with nothing else, there is no way you can get a TAXONID - thus I don't think these should be in this part.

I am not familiar enough with TAXONIDs but if you only have a Family - doesn't that have a TAXONID? When I previously worded #57, I assumed that all names in the hierarchy would have a TAXONID - Am I wrong?

  1. dwc:cultivarEpithet shouldn't be treated any differently to dwc:infraspecificEPithet so why would it be in 1)?

  2. I can't get my head around the rest - sorry. Needs another attempt

See the thinking behind my current version under my comment of 5 days ago.

chicoreus commented 2 years ago

The definiton for dwc:family is "The full scientific name of the family in which the taxon is classified." The set of higher taxonomy terms from dwc:genus on up (and this is why dwc:genericName "The genus part of the scientificName without authorship." was needed distinct from dwc:genus), are classifiers for the taxon, not the taxon. If only dwc:family is supplied we have no idea which taxon within that family is being referenced. This is distinctly different from the case where dwc:family and dwc:scientificName contain the same value. Here is it unambigously clear that the Taxon in question is the Family, but when Family is populated, and scientificName and taxonId are empty, we have no way of telling which taxon is being referenced, it might be the Family, or it might be any taxon that can be placed within that Family.

Yes, totally confusing....

On Fri, 11 Mar 2022 13:23:33 -0800 Arthur Chapman @.***> wrote:

I am not familiar enough with TAXONIDs but if you only have a Family

  • doesn't that have a TAXONID? When I previously worded #57, I assumed that all names in the hierarchy would have a TAXONID - Am I wrong?
ArthurChapman commented 2 years ago

Interesting - In the Botanical Code a taxon is described as "Taxonomic groups at any rank will, in this Code, be referred to as taxa (singular: taxon)."

Darwin Core Definition is "A group of organisms (sensu http://purl.obolibrary.org/obo/OBI_0100026) considered by taxonomists to form a homogeneous unit."

As a taxonomist - I have always regarded a Family name as a taxonomic name at the rank of Family. If we are using the term taxon in a different way - then we need to be clear and define it differently

chicoreus commented 2 years ago

That's exactly it, in Darwin core, dwc:family is already defined differently. In that context it means the family into which the dwc:Taxon record is currently classified. As I read that definition, a dwc;Taxon record where only the dwc:family is populated is one for which you do not know what the Taxon is. Again, the case where both dwc:scientificName or dwc:taxonID contain values and dwc:family contains a value is totally different, even if the value in dwc:scientificName is identical to that in the dwc;family, in that case the dwc:Taxon record is for the family, and that botanical code definition applies. When only dwc:family is populated, it is a reference to a Taxon, but as dwc:family, it is only a reference to that Taxon, not an unambiguous indicator of what taxon the current dwc:Taxon instance is referring to. The issue isn't in the definition of Taxon, that is clear. The issue is that dwc:family is an explicit reference to another Taxon into which the current dwc:Taxon record is classified. Consider:

9fd1e0b1-5e62-49eb-8bdd-4aee2c58e568 is a dwc:Taxon 9fd1e0b1-5e62-49eb-8bdd-4aee2c58e568 has dwc:Family Muricidae Rafinesque, 1815

ea6ff6cf-c21e-476f-854c-1c8cf4a3cd74 is a dwc:Taxon ea6ff6cf-c21e-476f-854c-1c8cf4a3cd74 has dwc:Family Muricidae Rafinesque, 1815 ea6ff6cf-c21e-476f-854c-1c8cf4a3cd74 has dwc:scientificName Muricidae Rafinesque, 1815 ea6ff6cf-c21e-476f-854c-1c8cf4a3cd74 has dwc:taxonID urn:lsid:marinespecies.org:taxname:148

4ecedbbd-6524-4b6f-a2f7-d3b61eda252a is a dwc:Taxon 4ecedbbd-6524-4b6f-a2f7-d3b61eda252a has dwc:Family Muricidae Rafinesque, 1815 4ecedbbd-6524-4b6f-a2f7-d3b61eda252a has dwc:scientificName Chicoreus brevifrons (Lamarck, 1822) 4ecedbbd-6524-4b6f-a2f7-d3b61eda252a has dwc:taxonID urn:lsid:marinespecies.org:taxname:558803

Can you tell from the information given what either the dwc:scientificName or dwc:taxonID of 9fd1e0b1-5e62-49eb-8bdd-4aee2c58e568 is from the information given? I can't. There is a dwc:Taxon in this data set for Muricidae, but it is 4ecedbbd-6524-4b6f-a2f7-d3b61eda252a, and from the information given, it isn't possible to tell if 9fd1e0b1-5e62-49eb-8bdd-4aee2c58e568 is the same taxon as 4ecedbbd-6524-4b6f-a2f7-d3b61eda252a.

ArthurChapman commented 2 years ago

OK - but still, if you only have a record with a Family i.e. Muricidae Rafinesque, 1815 and no information below - i.e. I have only been able to identify this taxon to Family - shouldn't we then add the Taxon ID if the sourceAuthority provides a TAXONID for a family name [I am not familiar enough to know if they do - if they don't then OK - I defer to what you propose]

9fd1e0b1-5e62-49eb-8bdd-4aee2c58e568 is a dwc:Taxon 9fd1e0b1-5e62-49eb-8bdd-4aee2c58e568 has dwc:Family Muricidae Rafinesque, 1815 xxxxxx-xxxx-xxxx-xxx-xxxxxxx has dwc:taxonID urn:lsid:marinespecies.org:taxname: xxx

Tasilee commented 2 years ago

Is there consensus on the Expected Response?

ArthurChapman commented 2 years ago

Not yet! There are some questions still to be answered (the email I sent around on #57 and #70) - for example

  1. on treatment of dwc:cultivarEpithet different to dwc:infraspecificEpithet (I believe they shouldn't be differently treated - @tucotuco to clarify use of dwc:cultivarEpithet) - see my comment of two days ago

  2. and whether or not the higher categories have TAXONIDs. From my email "I am still not fully convinced re TAXONID and higher level taxa. Does the sourceAuthority (GBIF?) give a TAXONID for a family name?I am not familiar enough with TAXON ID to know. If they don't then I accept @chicoreus arguments. But if they do, and a record has only a name at the Family level with no information at a lower level (i.e. I have only been able to identify this record to Family). If the sourceAuthority gives a Taxon ID for the Family - then why would be not use that TAXONID for the record.

    This is particularly relevant as the Botanical Code defines a taxa as "Taxonomic groups at any rank will, in this Code, be referred to as taxa (singular: taxon)." In the Zoological Code: "A taxonomic unit, whether named or not: i.e. a population, or group of populations of organisms which are usually inferred to be phylogenetically related and which have characters in common which differentiate (q.v.) the unit (e.g. a geographic population, a genus, a family, an order) from other such units. A taxon encompasses all included taxa of lower rank (q.v.) and individual organisms. The Code fully regulates the names of taxa only between and including the ranks of superfamily and subspecies" The Zoological Code treats a family name as a taxon

    "family name or name of a family A scientific name of a taxon at the rank of family."

    Darwin Core definition "A group of organisms (sensu http://purl.obolibrary.org/obo/OBI_0100026) considered by taxonomists to form a homogeneous unit." It gives the example of "The genus Truncorotaloides as published by Brönnimann et al. in 1953 in the Journal of Paleontology Vol. 27(6) p. 817-820."

tucotuco commented 2 years ago

Not yet! There are some questions still to be answered (the email I sent around on #57 and #70) - for example

  1. on treatment of dwc:cultivarEpithet different to dwc:infraspecificEpithet (I believe they shouldn't be differently treated - @tucotuco to clarify use of dwc:cultivarEpithet) - see my comment of two days ago

My understanding is that a cultivarEpithet should be as determinant of a Taxon as an infraspecificEpithet is and treated in the same way.

  1. and whether or not the higher categories have TAXONIDs. From my email "I am still not fully convinced re TAXONID and higher level taxa. Does the sourceAuthority (GBIF?) give a TAXONID for a family name?I am not familiar enough with TAXON ID to know. If they don't then I accept @chicoreus arguments. But if they do, and a record has only a name at the Family level with no information at a lower level (i.e. I have only been able to identify this record to Family). If the sourceAuthority gives a Taxon ID for the Family - then why would be not use that TAXONID for the record. This is particularly relevant as the Botanical Code defines a taxa as "Taxonomic groups at any rank will, in this Code, be referred to as taxa (singular: taxon)." In the Zoological Code: "A taxonomic unit, whether named or not: i.e. a population, or group of populations of organisms which are usually inferred to be phylogenetically related and which have characters in common which differentiate (q.v.) the unit (e.g. a geographic population, a genus, a family, an order) from other such units. A taxon encompasses all included taxa of lower rank (q.v.) and individual organisms. The Code fully regulates the names of taxa only between and including the ranks of superfamily and subspecies" The Zoological Code treats a family name as a taxon "family name or name of a family A scientific name of a taxon at the rank of family." Darwin Core definition "A group of organisms (sensu http://purl.obolibrary.org/obo/OBI_0100026) considered by taxonomists to form a homogeneous unit." It gives the example of "The genus Truncorotaloides as published by Brönnimann et al. in 1953 in the Journal of Paleontology Vol. 27(6) p. 817-820."

I agree with @chicoreus about the case where dwc:family (and no lower rank) is populated and dwc:scientificName is not, for the simple fact that the Taxon is ambiguous. Specifically, it MIGHT be the family, but it might be something in the family. Probably way too subtle for most people to worry about, but I think it's correct.

ArthurChapman commented 2 years ago

OK - if we accept the rasoning of @chicoreus and @tucotuco

INTERNAL_PREREQUISITES_NOT_MET if dwc:taxonID is not EMPTY or if all of, dwc:scientificName, dwc:genericName, dwc:specificEpithet, dwc:infraspecificEpithet, dwc:scientificNameAuthorship, and dwc:cultivarEpithet are EMPTY, AMENDED the value of taxonID for an unambiguously resolved single taxon record in the specified source authority service through (1) the value of dwc:scientificName or (2) if dwc:scientificName is EMPTY through values of the terms dwc:genericName, dwc:specificEpithet, dwc:infraspecificEpithet, dwc:scientificNameAuthorship and dwc:cultivarEpithet), or (3) if ambiguity produced by multiple matches in (1) or (2) can be disambiguated to a single Taxon using the values of dwc:subgenus, dwc:genus, dwc:subfamily, dwc:family, dwc:order, dwc:class, dwc:phylum, dwc:kingdom, dwc:higherClassification, dwc:scientificNameID, dwc:acceptedNameUsageID, dwc:originalNameUsageID, dwc:taxonConceptID, dwc:taxonomicRank, and dwc:vernacularName); otherwise NOT_AMENDED

If accepted it appears that we can take dwc:genericName and dwc:infragenericEpithet out of Information Elements

Note:

  1. I have taken dwc:cultivatedEpithet and dwc:taxonRank out of the INTERNAL_PREREQUISTES_NOT_MET
  2. I have changed the wording of (1) and (2) around dwc:cultivatedEpithet in the AMENDED area to treat it the same as dwc:infraspecificEpithet
  3. I have taken dwc:taxonRank out of (2)
  4. I have added dwc:subfamily to (3)
  5. Some other minor wording and spelling corrections
Tasilee commented 2 years ago

I have to defer to @chicoreus, @ArthurChapman and @tucotuco on this. I will apply @ArthurChapman's latest Expected Response, with a few more tweaks.

Tasilee commented 2 years ago

Are we all happy with the specifications on this one now?

Tasilee commented 2 years ago

Changed "AMENDED" to "FILLED_IN" in accordance with discussions April 16.

Tasilee commented 2 years ago

Amended Example to align with @chicoreus comments in email 17th June 2022.

chicoreus commented 2 years ago

So the text of cultivarEpithet should also be found in scientificName?

On Sun, 13 Mar 2022 22:37:49 -0700 John Wieczorek @.***> wrote:

  1. on treatment of dwc:cultivarEpithet different to dwc:infraspecificEpithet (I believe they shouldn't be differently treated - @tucotuco to clarify use of dwc:cultivarEpithet) - see my comment of two days ago

My understanding is that a cultivarEpithet should be as determinant of a Taxon as an infraspecificEpithet is and treated in the same way.

tucotuco commented 2 years ago

So the text of cultivarEpithet should also be found in scientificName?

Yes, I think it should. But for a definitive answer it is best to ask someone such as @mdoering and @ nielsklazenga.

ArthurChapman commented 2 years ago

@nielsklazenga - any comments? [space inadvertently included in last post by @tucotuco

nielsklazenga commented 2 years ago

Regarding cultivarEpithet, yes, that is part of the scientificName string.

Tasilee commented 1 year ago

Why don't we have an "EXTERNAL_PREREQUISITES_NOT_MET" if we reference bdq:sourceAuthority?!

I've added it as otherwise it will stuff up the test data work.

Tasilee commented 1 year ago

Changed Parameter(s) to "bdq:sourceAuthority" as per discussions 12th June 2023

ArthurChapman commented 1 year ago

I have added to the Notes to be consistent with https://github.com/tdwg/bdq/issues/71:

"When referencing a GBIF taxon by GBIF's identifier for that taxon, use the the pseudo-namespace "gbif:" and the form "gbif:{integer}" as the value for dwc:taxonID."

chicoreus commented 1 year ago

Will need to include the new terms dwc:superfamily, dwc:tribe, dwc:subtribe https://github.com/tdwg/dwc/issues/65 https://github.com/tdwg/dwc/issues/45 https://github.com/tdwg/dwc/issues/46

Tasilee commented 1 year ago

Added the terms dwc:superfamily, dwc:tribe, dwc:subtribe to the Information elements and Expected response, and updated Specification Last Updated.

On this one, please check my Expected response.

Tasilee commented 1 year ago

Amended Source Authority values to align with @chicoreus syntax

From

bdq:sourceAuthority default = "GBIF Backbone Taxonomy" [https://doi.org/10.15468/39omei] | | | API endpoint [https://api.gbif.org/v1/species?datasetKey=d7dddbf4-2cf0-4f39-9b2a-bb099caae36c&name=]

to

bdq:sourceAuthority default = "GBIF Backbone Taxonomy" {[https://doi.org/10.15468/39omei]} {API endpoint [https://api.gbif.org/v1/species?datasetKey=d7dddbf4-2cf0-4f39-9b2a-bb099caae36c&name=]}

Tasilee commented 1 year ago

Splitting bdqffdq:Information Elements into "Information Elements ActedUpon" and "Information Elements Consulted". Also changed "Field" to "TestField" and "Output Type" to "TestType".