tdwg / tnc

Taxonomic Names and Concepts Interest Group
22 stars 7 forks source link

Merge TaxonomicName into TaxonomicNameUsage #34

Closed nielsklazenga closed 4 years ago

nielsklazenga commented 5 years ago

In the discussion around the Darwin Core RDF Guide that @baskaufs summarised, @deepreef stated that a Taxonomic Name is a special form of a Taxonomic Name Usage (using slightly different terms). At the time I read this (only a few months ago) my first thought was 'Oh, rubbish!', but with all the discussions we have had in the TNC in the last six month, I have slowly moved to @deepreef's position.

We have had difficulty deciding which property goes in which class (or I have, as we haven't had any strong disagreements) and there is the outstanding issue of defining the classes and deciding where one class ends and the other begins.

I think (at the moment...) that we can do with one class what we can do with the two classes, by adding a 'primary taxonomic name usage' property (cf. #32), which would then replace the taxonomicName property (and be equivalent with dwc:scientificNameID). I think we could even keep calling the property taxonomicName.

deepreef commented 5 years ago

Ha! Maybe your initial reaction was the correct one? :-) Seriously, though -- that's not exactly how I would phrase it. We can think of a "Taxonomic Name" as either a literal text string (sequence of UTF-8 encoded characters), or we can think of it as a "data object" (class) with properties. We've pretty-much worked the literal string angle to death. The problem comes when people try to use the literal text string as a proxy for the data object. That's when we get into problems -- for a number of key issues (which I can elaborate on if needed). What I think your referring to is my assertion that a subset of all TNUs are what I call "Protonyms" (i.e., TNUs that happen to also establish new Taxonomic Names), and that these Protonyms (subclass of TNUs) serve is extremely powerful proxies for Taxonomic Names as "data objects". In other words, Protonyms represent an elegant way of building links between and among other TNUs in ways that allow us to magical informatic tricks (inferences). Over several decades of working with real data (through ZooBank and far beyond), I've concluded that all of the properties and inferences about nomenclature and taxonomy can be represented through TNUs, their core properties, and their relationships to each other. The two most important properties of a TNU are the literal text-string for the name-label (only the uninomial component, as literal text strings representing combinations are derived from sets of related TNUs), and the link to a Reference. The three most important relationships between TNUs are the link to the Protonym (self-referential for TNUs that are themselves Protonyms), the link to the Valid TNU (self-referential for names treated as valid/accepted), and the link to the Parent TNU (e.g., a species TNU linked to a parent TNU). Implicit in all three kinds of TNU-TNU relationships are that all are connctions between TNUs anchored to the same reference. These five components (two properties and three intra-Reference TNU-TNU relationships) form the kernel around which almost everything we do or care about in taxonomy and nomenclature can either be directly inferred, or captured in additional layers that cross-link inter-reference TNUs to each other. As David Remsen has described it, TNUs are the root "currency" of taxonomic information.

More generally, think about it this way: Taxon names and taxon concepts do not exist by themselves -- they only exist through the existence of individual usages. A name does not exist until some one establishes it (Protonym). Synonymies and classifications do not exist until someone asserts them. Almost everything we do nomenclaturally (typification, emendation, etc.) or taxonomically (synonymies, classifications, circumscription definitions, etc.) all happen through References, and thus TNUs.

This is why I see one class of taxonomic entities (TNUs). ANYTHING nomenclatural or taxonomic can be anchored to a TNU, and thus TNUs can serve as proxies for noemclatural actions and taxonomic circumscription assertions. Therefore, we really need only one class of object, and one set of properties and defined relationships, to track almost everything we're interested in.

My only suggestion would be to replace the term "primary taxonomic name usage" with "Protonym", and then I think we're in a really good place.

I do need to clarify something, however... dwc:scientificname refers to a literal text string (full combination), and so the associated dwc:scientificNameID is NOT the same thing as a Protonym (or "primary taxonomic name usage"). The dwc term that represents the Protonym is dwc:originalNameUsageID. In fact, all the terms we need for representing and mapping TNUs is already included in dwc (Taxon class). I will illustrate this when I put my examples together.

nielsklazenga commented 5 years ago

Assuming you are not talking about the 'Oh, rubbish!' bit, this is what you said (https://groups.google.com/forum/#!msg/tdwg-rdf/CMfwf10Ozpo/GW2GaWW8C78J):

In my world, name-objects are the subset of TNUs that represent protonyms. Those are the things I think we need to use as anchor-points for "taxon names".

...so yes, slightly different, but pretty close.

I think we are in full agreement. Only:

I do need to clarify something, however... dwc:scientificname refers to a literal text string (full combination), and so the associated dwc:scientificNameID is NOT the same thing as a Protonym (or "primary taxonomic name usage"). The dwc term that represents the Protonym is dwc:originalNameUsageID. In fact, all the terms we need for representing and mapping TNUs is already included in dwc (Taxon class). I will illustrate this when I put my examples together.

You forget that I am a botanist. In botany, new combinations and new status are nomenclatural acts. I was going to ask you if in zoology a protonym is the same as a basionym. For botanists, originalNameUsage is the basionym, while scientificNameID is the closest thing to a protonym in Darwin Core.

Happy to call the property 'protonym'. 'Protonym' was a new word for me a few months ago. I don't think it has a meaning in botany, so we (botanists) could use the more general Oxford dictionary definition:

The first person or thing of a certain name; something from which another person or thing takes its name.

...or we could do what zoologists do. Will have to think about the ramifications of that (would make it really easy to group nomenclatural synonyms).

deepreef commented 5 years ago

Thanks for sharing. Yup, that's what I said (and meant). The key, as noted elsewhere, is distinquishing "Taxonomic Name" as a literal text string from "Taxonomic Name" as a conceptual data object (with properties and relationships). The latter, I believe, is most logically/elegantly represented as the subset of TNUs that are Protonyms.

I've addressed the botany/zoology differences elsewhere (email) which can be copied here if deemed useful. But there are a few points to clarify:

nielsklazenga commented 5 years ago

I am totally on board with that, but I think it will be a hard sell to many botanists.

I think we (botanists) would want the dwc:scientificNameID (or the object it identifies) to be a TNU, so we have an object that can be shared between TNUs, rather than an identifier that is based on a string. I think this difference is trivial, as you said earlier that you wouldn't even use this identifier.

deepreef commented 5 years ago

As long as botanists understand that any TNU (including new combinations) can be branded as a nomenclatural act under a particular Code, and that how a name is rendered can conform to botanical conventions, I don't see why there should be any resistance. The data model/exchange standard should be based on underlying informatics, not presentation-layer conventions. In fact, those presentation-layer conventions have been a major (and unnecessary) impediment to progress for many years (this applies to all Codes). Moreover, the information model was hashed out and re-hashed out over more than a dozen meetings spanning multiple years including myself, @ghwhitbread , Paul Kirk, Nicki Nicholson, @stanblum , Dima Mozherrin, @mdoering and many others (which includes contributions from folks familiar with the botanical/fungal Code and practice) through a series of "NOMINA" meetings. It is very intentionally (and carefully) Code-agnostic -- not biased towards one Code or another in any way.

As for @scientificName/ID, it's clear from the dwc defininiton that this term is intended to represent a text string (and associated ID for such); not a TNU. By contrast, dwc:originalNameusageID was created specifically to represent the TNU that in the vast majority of cases we would now refer to as a "Protonym". There is actually a subtle difference in the definition for dwc:originalNameusageID from the definition of a "Protonym", so we may need to introduce a new term for "protonymID". In 99%+ cases, dwc:originalNameusageID and protonymID would be the same thing, but 1% of millions of names is a lot of names -- so it may be worth documenting the subtle but important distinction between dwc:originalNameusageID and a new term "protonymID"

nielsklazenga commented 5 years ago

Our biggest difference might be in the definition of 'botanist'. Those people you mention are not the people I was worried about.

My field mapping was more analogues than exact matches. I don't really care if the identifier represents a text string or a TNU, as long as I can use the identifier, or even the name string (incl. authors) itself to find all TNUs for the same combination. I would be interested to know what the difference is between originalNameUsageID and protonymID (not right now though). It would be good if originalNameUsageID were the more general term and then we can split that in the Name standard, if we decide to do so.

deepreef commented 5 years ago

I suppose you could use scientificName/ID to find all TNUs for the same combination, but it would be MUCH easier to find them with originalNameUsageID -- especially if that leads to a Protonym. For example, suppose you are interested in all TNUs for Xus bus (L.) Smith. There would be one TNU for Xus bus (L.) Smith sec. Smith, which would include the Protonym (originalNameUsageID) link back to Aus bus L., as well as a link to the parentNameUsageID, itself leading us to the originalNameUsageID for the genus Xus. From there, it's an extremely simple matter to filter to all TNUs for "bus L.", wherein the species was placed within the genus "Xus" -- and thus you would have all TNUs (including all spelling variants) for Xus bus (L.) Smith:

(Pyle never was very good at spelling his names correctly...)

You could probably get the same results by submitting the scientificNameID for "Xus bus (L.) Smith", but it would be a much more fragile query, relying on parsing the name-string, possibly recognizing "L." is the same as "Linneaus", and accounting for homonyms and any other variations in orthography. In all likelihood, a service that consumed a value for scientificNameID would go through a number of machinations to derive originalNameUsageID (or ProtonymID) values for "Xus" and "bus L.", then redirecting to the process described above.

Of course the bottleneck is in parsing out our taxonomic data and assigning all these ID values to allow this "magic" to happen. At the moment, we have ~700K TNUs linked to ~300K Protonyms, which is FAR from the total, but is a very good start. We have a very clear roadmap how to capture millions more from sources such as Sherborn, BHL, CoL, and various other databases... now all we need is a bit of funding! :-)

nielsklazenga commented 5 years ago

Works for me.

mdoering commented 5 years ago

Catching up the latest conversation a short comment on DwC terms. originalNameUsage and originalNameUsageID is the code agnostic term to refer to a basionym or protonym. scientificNameID is meant to refer to some external nomenclatural authority record that treats the exact combination as given by scientificName. It is not meant to hold the basionym/protonym.

Regarding the difference between basionym and protonym I disagree with @deepreef that basionym is a relation. It is a TNU just as the protonym, but it requires another combination to exist to come to existance. A name that has never been recombined into a new genus or rank does not have a basionym and is not considered to be a basionym yet while it is already a protonym.

deepreef commented 5 years ago

Thanks, @mdoering.

I had many long conversations with Botanical "Code Warriors" on this topic, and the consensus was that the term "Basionym" is better thought of as a relationship between a subsequent combination and the original combination, because as you note, a "Basionym" does not exist until a new combination is created, so it only exists in the context of a relationship between a subsequent combination and its original combination.

But I agree that many people use the term "Basionym" in a similar way to "Protonym", as you note (i.e., as a TNU). Even if we defined Basionym to be a TNU (instead of a relationship between a new combination and the original combination), it is different from Protonym because Protonym refers to the first TNU for a name (chronologically), whereas Basionym is the first "legitimate" (Code-compliant) TNU for a name. In that sense, Basionym is much more similar to originalNameUsage/ID, as defined (which explicitly refers to Code-compliance).

Another problem with "Basionym" is that it only applies to species-group names (species, subspecies, varieties, etc.), whereas both Protonym and originalNameUsage apply to names all the way up to Kingdom/Domain/etc.

nielsklazenga commented 5 years ago

Great to have you back, @mdoering. We will start talking about relationships at the next meeting. Note, just because we discuss things as relationships does not mean they have to end up in the specification as relationships.

The Botanical Code defines 'basionym' as a synonym:

basionym. A previously published legitimate name-bringing or epithet-bringing synonym from which a new name is formed for a taxon of different rank or position.

There is also 'replaced synonym', which is the counterpart of 'basionym', but for replacement names (nomina nova):

replaced synonym. The name replaced by an avowed substitute (nomen novum, replacement name).

@deepreef and I had a long email exchange about homotypic and heterotypic synonyms and whether they constituted TNUs, which I have dumped in https://github.com/tdwg/tnc/blob/master/examples/homotypic-heterotypic-examples.md. Please bear with us, @deepreef has a significant head start and it took me a while to catch up.

Basionyms and replaced synonyms are necessarily homotypic synonyms and we concluded that for homotypic synonyms the relationship didn't need to be made explicit (as it is already in the relationship with the protonym), but I think we should make an exception for basionyms and replaced synonyms, as they do provide additional data.