tdwg / dwc

Darwin Core standard for sharing of information about biological diversity.
https://dwc.tdwg.org
Creative Commons Attribution 4.0 International
205 stars 70 forks source link

Change terms - acceptedNameUsageID, parentNameUsageID, originalNameUsageID #105

Closed mdoering closed 3 years ago

mdoering commented 9 years ago

Change term

Current Term definition: https://dwc.tdwg.org/terms/#dwc:acceptedNameUsageID

Proposed new attributes of the term:

Change term

Current Term definition: https://dwc.tdwg.org/terms/#dwc:parentNameUsageID

Proposed new attributes of the term:

Change term

Current Term definition: https://dwc.tdwg.org/terms/#dwc:originalNameUsageID

Original comment from Markus Döring (@mdoering): The definition of dwc:acceptedNameUsageID does not clearly specify what values it expects and how they relate to other dwc terms. Currently: "An identifier for the name usage (documented meaning of the name according to a source) of the currently valid (zoological) or accepted (botanical) taxon."

In order to correctly establish relation to the accepted usage it should be clear to which property this id relates. In GBIF and all implementations I am aware of this is the dwc:taxonID property. For example see this issue from the dwca validator: https://github.com/gbif/dwca-validator/issues/3

The definition or at least comments of acceptedNameUsageID (and similar parentNameUsageID and originalNameUsageID) should be changed to mention this. This is a common cause of referential integrity problems for taxonomic datasets we see at GBIF.

tucotuco commented 4 years ago

Now that the commentaries are separate from the normative definitions of the terms, this recommendation can be added without the full public review process. @mdoering What exactly would you propose for the comment to say?

mdoering commented 4 years ago

A first suggestion would be:

acceptedNameUsageID should be used for synonyms or misapplied names to refer to the taxonID of an existing Darwin Core Taxon record that represents the accepted (botanical) or valid (zoological) name.

Similarily a comment should exist for parentNameUsageID:

parentNameUsageID should be used for accepted names to refer to the taxonID of an existing Darwin Core Taxon record that represents the next higher taxon of their taxonomic classification.

and originalNameUsageID

originalNameUsageID should be used to refer to the taxonID of an existing Darwin Core Taxon record that represents the original combination, i.e. protonym (zoology) or basionym (botany) of the name.

nielsklazenga commented 4 years ago

@mdoering are these definitions or usage notes?

mdoering commented 4 years ago

proposed comments, not normative definitions

tucotuco commented 3 years ago

I have added the templated term change proposal for all three terms in the first comment in this issue. I have added examples from ITIS. It would be good to have equivalents from Catalogue of Life and to be sure that the GBIF examples are apt to include.

deepreef commented 3 years ago

I strongly support this clarification. Note that the "Usage Comments" for parentNameUsageID and originalNameUsageID were reversed in the original proposal, but correctly represented in the subsequent comment by @mdoering.

baskaufs commented 3 years ago

All of these are "ID" terms without dwciri: namespace analogs.

tucotuco commented 3 years ago

The outcome of this change proposal should also inform Issues #350, #351 and #352.

qgroom commented 3 years ago

acceptedNameUsageID should be used for synonyms or misapplied names to refer to the taxonID of an existing Darwin Core Taxon record that represents the accepted (botanical) or valid (zoological) name.

This is confusing because it makes no mention of scientificName, which I assume is the field where the synonym or misapplied name is that it is referring to.

qgroom commented 3 years ago

shouldn't parentNameUsageID be related to the higher taxon rank and the originalNameUsageID the basionym? It is the other way around at the moment

tucotuco commented 3 years ago

Fixed. Thank you!

mdoering commented 3 years ago

acceptedNameUsageID should be used for synonyms or misapplied names to refer to the taxonID of an existing Darwin Core Taxon record that represents the accepted (botanical) or valid (zoological) name.

This is confusing because it makes no mention of scientificName, which I assume is the field where the synonym or misapplied name is that it is referring to.

The existing Darwin Core Taxon record does not even have to have a scientificName. A parsed name with genericName and specificEpithet would work too.

qgroom commented 3 years ago

acceptedNameUsageID should be used for synonyms or misapplied names to refer to the taxonID of an existing Darwin Core Taxon record that represents the accepted (botanical) or valid (zoological) name.

This is confusing because it makes no mention of scientificName, which I assume is the field where the synonym or misapplied name is that it is referring to.

The existing Darwin Core Taxon record does not even have to have a scientificName. A parsed name with genericName and specificEpithet would work too.

Ah! I was under the impression that these atomized name terms were just for convenience. You can see why a user might be confused. The atomized rank terms do specifically refer to the scientificName, which is why I would have thought scientificName comes first.

Furthermore, shouldn't it be explicit whether nameAccordingTo refers to acceptedNameUsage or to scientificName? I can see good reasons for either, but not both.

nielsklazenga commented 3 years ago

Ah! I was under the impression that these atomized name terms were just for convenience. You can see why a user might be confused.

This is not in the standard, but in the implementation. When delivering to ALA, you need a scientificName, as ALA will ignore the parsed name terms.

Furthermore, shouldn't it be explicit whether nameAccordingTo refers to acceptedNameUsage or to scientificName? I can see good reasons for either, but not both.

scientificName and nameAccordingTo together form the taxon concept label (sensu Franz). I think the definition is already pretty good, but we could do something in the usage comments. I will volunteer for that (after the weekend), but I think that should be a new 'Change term' GH issue.

tucotuco commented 3 years ago

I see no remaining problems in the discussion for the proposed changes in this issue, but rather clarifications that ought to be made in scientificName and nameAccordingTo. If that is not the case, please let me know. The ideas for clarifying scientificName and nameAccordingTo can be made in this same public review without holding anything up if they are non-normative change requests (comments and/or examples).

bpescador commented 3 years ago

I have added the templated term change proposal for all three terms in the first comment in this issue. I have added examples from ITIS. It would be good to have equivalents from Catalogue of Life and to be sure that the GBIF examples are apt to include.

I don't see the link to the examples. I am looking for clarity on what is expected for originalNameUsage (original combination or protonym). Should the name be the original spelling even if incorrect? Or should it be the corrected original spelling. Should the name include joining words such as "var." or should all joining words be cleaned from the protonym.

mdoering commented 3 years ago

This issue is not about originalNameUsage, but originalNameUsageID and the other 2 ID terms.

Still the question then concerns the scientificName of the related Taxon record. In botany a basionym must be a legitimate name. I would expect the protonym in zoology to be the original spelling, but maybe not for incorrect names? @deepreef any guidance here?

In botany with various infraspecific ranks you clearly include rank markers as var. with the name. I would recommend to do the same in zoology if it is not considered a subspecies. The term definition of scientificName already gives some examples.

nielsklazenga commented 3 years ago

In botany, originalNameUsage is either the basionym or the replaced synonym, which is to an avowed substitute (nom. nov.) what a basionym is to a new combination. The term 'protonym' is not used in the Zoological Code and it would be best if it did not end up in any Darwin Core definitions, as it has been used in mycology in a different sense and that is the sense in which it is defined in David Hawkworth's 'Terms used in Bionomenclature'.

The rank prefix for infraspecific epithets in botanical names is compulsory under the Botanical Code, so that does not need to be in Darwin Core. Zoological names only have one infraspecific rank, so do not use a prefix.

mdoering commented 3 years ago

The term 'protonym' is not used in the Zoological Code and it would be best if it did not end up in any Darwin Core definitions, as it has been used in mycology in a different sense and that is the sense in which it is defined in David Hawkworth's 'Terms used in Bionomenclature'.

Yes, that's one of the reasons why we decided to use the not overloaded and code agnostic originalName in Darwin Core.

Zoological names only have one infraspecific rank, so do not use a prefix.

Only in recent editions of ICZN so not necessarily in original publications of older names. So the question whether the original spelling should be used or some emendation still stands. Personally I prefer the original spelling, but I guess this does not need to be prescribed by Darwin Core at all.

deepreef commented 3 years ago

I would expect the protonym in zoology to be the original spelling, but maybe not for incorrect names? @deepreef any guidance here?

As noted by @nielsklazenga, the word "Protonym" has no official definition in zoology. It actually has several general definitions in the context of taxonomy/nomenclature, but I imagine the one you are referring to is the one that I use (defined here). The definition of protonym in this sense is somewhat different from basionym. First, basionyms apply to names below the rank of genus, whereas protonym (in my sense) applies to all ranks, up to kingdom/domain, etc.. Second, original combinations are not "born" as basionyms -- they only become basionyms when a subsequent publication establishes a new combination (Protonyms are born upon first usage, regardless of whether there are any subsequent usages). Third, I believe (as you note) that basionyms represent the first "legitimate" (sensu Code) use/spelling/combination of the name, whereas Protonyms are the first chronological usage of a name, independent of Code compliance. Protonyms have a literal spelling as a property, but they are not defined by this spelling.

In botany with various infraspecific ranks you clearly include rank markers as var. with the name. I would recommend to do the same in zoology if it is not considered a subspecies. The term definition of scientificName already gives some examples.

Again, as noted by @nielsklazenga, only subspecies are allowed under the ICZN Code (no other infraspecific ranks). Historically they have been used; some converted to subspecies; others rejected as infrasubspecific. Thus, in zoology, trinominals are assumed to be subspecies, and an indicator of such (e.g., "ssp." or "subsp.") is not encouraged.

I think originalNameUsage is probably the best candidate (not originalName, which implies a text string, rather than a usage instance), and I'd be willing to abandon "protonym" for that term. However, that would require a revised definition for those terms (originalNameUsage/originalNameUsageID), to eliminate the "when first established under the rules of the associated nomenclaturalCode" part. I suspect such a change in definition would be met with substantial resistance, however. If the DwC definition cannot be changed to liberate the concept from "rules of associated nomenclaturalCode", then we'll need a new term, and I suspect in that case that a carefully defined term protonym would then become the next best candidate.

I would like to see these technical issues sorted out within the TCS-2 task group, then framed as the next iteration of terms organized within the Taxon class of DwC. The most important thing, I think, will be to lock in the definition of the instance of a DwC Taxon class as a usage instance (not a text string, and not a taxon concept per se).

timrobertson100 commented 3 years ago

@mdoering - I recall that in a DwC-A the implementations all expect these IDs to be local foreign keys and will fail with referential integrity checks if not present.

If that is the case, can you please state this in the usage comments when making the change? I am aware at least of at least one recent example where an external ID was used without carrying the record within the DwC-A.

mdoering commented 3 years ago

Yes, that makes sense. I would propose an addition to the current proposal:

Comments: This term should be used for accepted names to refer to the taxonID of a Taxon record that represents the next higher taxon rank in the same taxonomic classification.

Suggested update: Comments: This term should be used for accepted names to refer to the taxonID of a Taxon record that represents the next higher taxon rank in the same taxonomic classification. For Darwin Core Archives the related record should be present locally in the same archive.

deepreef commented 3 years ago

That wording seems OK to me for parentNameUsageID, but would need to be tweaked for the other two:

originalNameUsageID: Comments: This term should be used to refer to the taxonID of a Taxon record that represents the usage of the terminal element of the scientificName as originally established under the rules of the associated nomenclaturalCode. For Darwin Core Archives the related record should be present locally in the same archive. [*Note: this wording is modified from the proposed text to reflect the recent discussion on basionym/protonym.]

acceptedNameUsageID: Comments: This term should be used for synonyms or misapplied names to refer to the taxonID of a Taxon record that represents the accepted (botanical) or valid (zoological) name. For Darwin Core Archives the related record should be present locally in the same archive.

mdoering commented 3 years ago

I have added examples from ITIS. It would be good to have equivalents from Catalogue of Life and to be sure that the GBIF examples are apt to include.

Indeed the GBIF species URLs are not used as identifiers at GBIF. It is the integer only that is the taxonID. I would propose to change that, but it would be good to also include a true global URI identifier as one of the examples. I suggest to add IPNI LSID examples for this and update ITIS, GBIF & COL to all point to Poa (parentNameUsageID) or Poa annua L. (acceptedNameUsageID & originalNameUsageID):

mdoering commented 3 years ago

Summary of the current proposal of term changes, listing changes only:

parentNameUsageID

Comments: This term should be used for accepted names to refer to the taxonID of a Taxon record that represents the next higher taxon rank in the same taxonomic classification. For Darwin Core Archives the related record should be present locally in the same archive. Examples: tsn:41074 (ITIS),urn:lsid:ipni.org:names:30001404-2 (IPNI),2704173 (GBIF), 6T8N (COL)

acceptedNameUsageID

Comments: This term should be used for synonyms or misapplied names to refer to the taxonID of a Taxon record that represents the accepted (botanical) or valid (zoological) name. For Darwin Core Archives the related record should be present locally in the same archive. Examples: tsn:41107 (ITIS),urn:lsid:ipni.org:names:320035-2 (IPNI),2704179 (GBIF), 6W3C4 (COL)

originalNameUsageID

Comments: This term should be used to refer to the taxonID of a Taxon record that represents the usage of the terminal element of the scientificName as originally established under the rules of the associated nomenclaturalCode. For Darwin Core Archives the related record should be present locally in the same archive. Examples: tsn:41107 (ITIS),urn:lsid:ipni.org:names:320035-2 (IPNI),2704179 (GBIF), 6W3C4 (COL)

mdoering commented 3 years ago

I would think for originalNameUsageID it is wise though to have also the nomenclatural terms basionym and protonym in the usage comments, as this otherwise just repeats the definition and is a bit hard to understand for casual users.

How about we change the comments for originalNameUsageID to be more like the one for acceptedNameUsageID?

Comments: This term should be used to refer to the taxonID of a Taxon record that represents the usage of the protonym (zoology) or basionym (botany).

deepreef commented 3 years ago

@mdoering :

Please note that "Protonym" is not the zoological counterpart to "Basionym". These two terms have two different meanings, and both terms have relevance to all names under all Codes. The concept of "basyonym" (the relationship between a subsequent combination and its Code-compliant original combination) exists in Zoology just as it does in Botany, but that term is not used in zoology or defined in the ICZN Code because new combinations are not considered as Code-governed nomenclatural acts. But it still has the same conceptual meaning in zoology.

Similarly, the concept of "Protonym" as you use it here (it has been defined in at least three different ways in the context of taxonomy & nomenclature) applies equally to names governed by zoological and botanical and bacteriological Codes.

In summary: Basionym:

Protonym:

Therefore, I would recommend:

"Comments: This term should be used to refer to the taxonID of a Taxon record that represents the taxonomic name usage that first established a name in compliance with the relevant Code of nomenclature. For example, for names governed by the ICNafp, this term would establish the relationship between a record representing a subsequent combination and the record for its corresponding basionym. Unlike basionyms, however, this term can apply to scientific names at all ranks."

I think it's important to clarify that originalNameUsageID refers to the first Code-compliant usage ("...the usage of the terminal element of the scientificName as originally established under the rules of the associated nomenclaturalCode."). This is not always the same as Protonym, so perhaps best not to confuse the issue by using that term here.

nielsklazenga commented 3 years ago

@deepreef's usage comment goes in the right direction, but it is important to note that a botanical name cannot be its own basionym (I think that is what makes that basionym cannot be used at all ranks), while originalNameUsage can. Also, in botany we also have the 'replaced synonym', which is to an avowed substitute (nom. nov.) what a basionym is to a new combination (comb. nov.). So originalNameUsage can be (a use of) the basionym, the replaced synonym, or the scientific name itself. originalNameUsage just combines a few different types of relationships, it does not establish them.

mdoering commented 3 years ago

I am happy to use Rich's comment as it clearly mentions basionyms which is my main point that we should use common terminology in the usage comments.

If replacement names are actually covered by originalNameUsage I am in doubt though. I had thought so, but if I read the definition I think they are not, because the terminal epithet is actually a different one. It has been replaced:

An identifier for the name usage (documented meaning of the name according to a source) in which the terminal element of the scientificName was originally established under the rules of the associated nomenclaturalCode.

I would even think a replacement name is a protonym itself and does not have the replaced name as its protonym. Clarifying this in the comments might also be a good thing.

nielsklazenga commented 3 years ago

@mdoering, you are right, that definition (which is the definition of originalNameUsageID) indeed excludes replaced synonyms. That is a terrible definition and completely different from that for originalNameUsage, which already has 'basionym' in it.

A replacement name can be an originalNameUsage if the name it replaces is illegitimate. If the replaced name is legitimate, however, the replaced synonym will be the originalNameUsage and the basionym for future combinations (if the combination is available). Protonyms as defined in Hawkesworth (2017) are invalid, so cannot be originalNameUsages. As @deepreef suggested, it is better to avoid that term.

Go with @deepreef's usage comment.

mdoering commented 3 years ago

Wow, I only know realize that the 2 definitions are not matching up at all:

originalNameUsage

The taxon name, with authorship and date information if known, as it originally appeared when first established under the rules of the associated nomenclaturalCode. The basionym (botany) or basonym (bacteriology) of the scientificName or the senior/earlier homonym for replaced names.

originalNameUsageID

An identifier for the name usage (documented meaning of the name according to a source) in which the terminal element of the scientificName was originally established under the rules of the associated nomenclaturalCode.

I guess we cannot change the definition of originalNameUsageID anymore in this round to be inline with the one for originalNameUsage?

mdoering commented 3 years ago

The taxon name ... as it originally appeared when first established...

That definition is less clear than the one for the ID:

name usage in which the terminal element of the scientificName was originally established

It depends what you consider a name which we don't all agree on. As a botanist I would not say Pinus abies L. is the same name as Picea abies (L.) H.Karst.. Therefore I would prefer to change the definition of originalNameUsage instead the one for originalNameUsageID. This would work:

The name, with authorship and date information if known, of the name usage (documented meaning of the name according to a source) in which the terminal element of the scientificName was originally established under the rules of the associated nomenclaturalCode.

But then we exclude replacement names. Which I don't mind as long as its clear what an originalNameUsage is and the definitions of originalNameUsageID matches up.

nielsklazenga commented 3 years ago

@mdoering , yes, I guess we should try to do what we can with the usage comments for now and deal with the definitions later.

I am actually more comfortable with the definition for originalNameUsage than that for originalNameUsageID, as I think names are established as a whole, not as individual parts. I guess that is a botanist thing (I had to look up what a replacement name is too). The definition of originalNameUsageID looks very zoological to me.

mdoering commented 3 years ago

I agree. It is not simple to provide a definition that reads well in the eye of zoologists and botanists. And a name being just the epithet for (most) zoologists and the entire combination in the mind of a botanist is the biggest challenge. originalNameUsage I still think was a good term name for that purpose.

On the other hand dwc:scientificName is not supposed to be just the epithet alone, but the full "name" even including authorship. So in that regard the originalNameUsage definition fits better into the DwC universe.

Shall we create a new issue to redefine the definition some other time so we don't lose it?

nielsklazenga commented 3 years ago

Yes, maybe just a place holder and then sort it out after TCS 2.

deepreef commented 3 years ago

I would not use originalNameUsage / originalNameUsageID to establish relationships between replacement names and the names they replace. My perspective on this is in line with what @mdoering wrote above.

Hawksworth's definition of Protonym is not the same as my definition (Taxonomer data model), or Dubois' definition. Hawksworth is actually repeating an earlier definition used in fungi, which Paul Kirk encouraged me to ignore. These are the three definitions of "Protonym" (Mycological, Taxonomer, Dubois), and I believe @mdoering was referring to the Taxonomer definition. In any case, best to avoid this term until we have a single agreed-upon definition.

The terms originalNameUsage / originalNameUsageID originally emerged from a series of GNA NOMINA meetings, which had as much representation from the botanical Code (Paul Kirk, Greg Whitbread, Nicky Nicolson) as from the Zoological Code (me, Paddy Patterson, Stan Blum); so it was framed in a way that would work for both Codes equally.

The reason for the "Usage" qualifiers on these terms was specifically to liberate the terms from the botanical vs. zoological biases for what is meant by "name". We can all agree on what a usage is, regardless of whether a "name" represents a full combination, or the individual elements of a combination.

I agree with @nielsklazenga that we should sort this out when reconciling the outcomes of TCS 2 with dwc:Taxon terms.

tucotuco commented 3 years ago

Does this mean we should table all the usage comments suggested here for a future release, or is there consensus that we can move forward with these. If the latter, I would appreciate if that can be stated so I can proceed with the pursuit of ratification.

mdoering commented 3 years ago

I have created a new issue #360 to discuss the future change of definitions.

@tucotuco I think we can all agree to change the usage comments to the following, complying with their current definitions:

parentNameUsageID

Comments: This term should be used for accepted names to refer to the taxonID of a Taxon record that represents the next higher taxon rank in the same taxonomic classification. For Darwin Core Archives the related record should be present locally in the same archive. Examples: tsn:41074 (ITIS),urn:lsid:ipni.org:names:30001404-2 (IPNI),2704173 (GBIF), 6T8N (COL)

acceptedNameUsageID

Comments: This term should be used for synonyms or misapplied names to refer to the taxonID of a Taxon record that represents the accepted (botanical) or valid (zoological) name. For Darwin Core Archives the related record should be present locally in the same archive. Examples: tsn:41107 (ITIS),urn:lsid:ipni.org:names:320035-2 (IPNI),2704179 (GBIF), 6W3C4 (COL)

originalNameUsageID

Comments: This term should be used to refer to the taxonID of a Taxon record that represents the usage of the terminal element of the scientificName as originally established under the rules of the associated nomenclaturalCode. For example, for names governed by the ICNafp, this term would establish the relationship between a record representing a subsequent combination and the record for its corresponding basionym. Unlike basionyms, however, this term can apply to scientific names at all ranks. For Darwin Core Archives the related record should be present locally in the same archive. Examples: tsn:41107 (ITIS),urn:lsid:ipni.org:names:320035-2 (IPNI),2704179 (GBIF), 6W3C4 (COL)

tucotuco commented 3 years ago

Excellent, thank you.

On Wed, Jun 16, 2021 at 7:01 PM Markus Döring @.***> wrote:

I have created a new issue #360 https://github.com/tdwg/dwc/issues/360 to discuss the future change of definitions.

@tucotuco https://github.com/tucotuco I think we can all agree to change the usage comments to the following, complying with their current definitions: parentNameUsageID

Comments: This term should be used for accepted names to refer to the taxonID of a Taxon record that represents the next higher taxon rank in the same taxonomic classification. For Darwin Core Archives the related record should be present locally in the same archive. Examples: tsn:41074 (ITIS),urn:lsid:ipni.org:names:30001404-2 (IPNI), 2704173 (GBIF), 6T8N (COL) acceptedNameUsageID

Comments: This term should be used for synonyms or misapplied names to refer to the taxonID of a Taxon record that represents the accepted (botanical) or valid (zoological) name. For Darwin Core Archives the related record should be present locally in the same archive. Examples: tsn:41107 (ITIS),urn:lsid:ipni.org:names:320035-2 (IPNI), 2704179 (GBIF), 6W3C4 (COL) originalNameUsageID

Comments: This term should be used to refer to the taxonID of a Taxon record that represents the usage of the terminal element of the scientificName as originally established under the rules of the associated nomenclaturalCode. For example, for names governed by the ICNafp, this term would establish the relationship between a record representing a subsequent combination and the record for its corresponding basionym. Unlike basionyms, however, this term can apply to scientific names at all ranks. For Darwin Core Archives the related record should be present locally in the same archive. Examples: tsn:41107 (ITIS),urn:lsid:ipni.org:names:320035-2 (IPNI), 2704179 (GBIF), 6W3C4 (COL)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tdwg/dwc/issues/105#issuecomment-862758267, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADQ72Z4XZOQJITH2Z46EGLTTENKFANCNFSM4BOQM67Q .

tucotuco commented 3 years ago

Done.