Document use of NomenclaturalNoteType properties

nielsklazenga commented 5 years ago

Properties in TCS that have the NomenclaturalNoteType as their range include spellingCorrectionOf, basionym, basedOn, conservedAgainst, laterHomonymOf, sanctioned, replacementNameFor and publicationStatus. Most of these look like relationships between primary name usages, but, since they do not require taxonomic opinion, they might, if only for practical purposes, be treated as properties on a TaxonomicName object. In today's meeting it was agreed to document how these elements are used in some current implementation – and what an ideal implementation would look like – on the Wiki.

mdoering commented 5 years ago

In TCS NomenclaturalNoteType properties are found on TaxonName and I hope we stick to having a name object with (nomenclatural) name relations instead of moving all relations to the usage level.

For documenting use the TCS guide has many examples which I extended in this CoL+ Names document that might be helpful.

deepreef commented 5 years ago

Can you elaborate on what the "essence" of this name object might be? Here are some options: 1) One instance for each Protonym 2) One instance for each Protonym or set of 2-3 Protonyms (species & subspecies) 3) One instance for each Protonym or set of 2-4 Protonyms (species & subspecies, with subgenera) 4) One instance for each Protonym or set of n Protonyms (any combination of multiple parts within a rank) 5) One instance for each unique canonical name-string (all misspellings/combinations, but no qualifiers) 6) One instance for each unique plain name-string (all misspellings/combinations, all variations of qualifiers included) 7) One instance for each unique plain name-string with authorships (same as above, but with authorship text included)

Something else?

This is why we have so much trouble with "name -objects" - there are so many ways to define them.

Personally, I think the best/most practical path forward is to go with either 1, 2, or 6. Each approach has different strengths and weaknesses, and the devil always lies in the details. 1 and 2 require usages in the sense that the original usage instance is involved, and there needs to be homonym disambiguation as part of the process. 6 is much easier for implementation purposes, as Protonym anchoring can be done after the fact.

In the context of TCS NomenclaturalNoteType and the (excellent!) CoL+ Names document, most of the relationships apply to 1, but a few apply to 2. Therefore, 6 (~= GlobalNames work by Dima) is valuable as a separate object to serve as a powerful tool for disambiguation/reconciliation; but 1 or 2 are much better for getting us closer to the "clean bucket", and for doing certain informatics "magic" in other ways.

Note: There is no need to flesh out full literature citation details when minting a Protonym instance -- just some degree of confidence that a Protonym instance hasn't already been minted for it. Although Protonyms are rooted in usages (i.e., the original usages), they can be created even when full details of the original usages have not been fleshed out, and relationships among them do exist independently of Usages, so I think this approach (1 or 2) can be a viable pathway to solving several problems (if I understand Markus' needs correctly).

mdoering commented 5 years ago

I am not convinced it really is impossible and so difficult. Maybe its a zoological vs botanical thinking. Although I hope we can get a shared understanding of what a Name is at the end it may not matter for the standard and be left to the user to decide?

The ICZN glossary defines it as:

name, n. (1) (general) A word, or ordered sequence of words, conventionally used to denote and identify a particular entity (e.g. a person, place, object, concept). (2) Equivalent to scientific name (q.v.). (3) An element of the name of a species-group taxon: see generic name, subgeneric name, specific name, subspecific name.

scientific name Of a taxon: a name that conforms to Article 1, as opposed to a vernacular name. The scientific name of a taxon at any rank above the species group consists of one name; that of a species, two names (a binomen); and that of a subspecies, three names (a trinomen) [Arts. 4 and 5]. A scientific name is not necessarily available.

It's a bit recursive, for subspecies a scientific name consists of three names. But essentially it appears to be just a string made up of 1-3 tokens? If I understand correctly that would be number 2 in your list, the set of 1-3 protonyms?

Generally people include the original publication as a property of the name and IPNI for example tracks duplicate name strings if their original publication is different. That leads us into the world of usages, but restricts them to usages that intended to describe new scientific names (as in 1-3 tokens). In the vast majority of data dealing with names we do not have any idea about the publication and we are lucky to have an authorship. Not sure how far you can go with a usage centric model if we lack references.

We surely have a much wider playground than just scientific names with 1-3 tokens. We are dealing with hybrid formulas, virus names, bacterial strains and candidate names, OTUs and various other "informal" names.

nielsklazenga commented 5 years ago

@deepreef Could you give a definition of 'protonym'? It's not a term botanists (at least this one) are familiar with and it is playing quite an important role in the discussions, so it would be good to pin down its meaning.

deepreef commented 5 years ago

Protonym is defined and explained here. There are a few slightly ambiguous aspects that need clarification, but these are edge cases. Technically, a Protonym is a usage, but as the original usage, it also serves the function of a surrogate for a "name". However, "name" in this sense is the epithet. For example, "Aus bus" is composed of two names: The genus "Aus", and the species "bus". Each has an original description. Thus, "Aus bus" is an array of two protonyms (one for "Aus", and one for "bus"). Protonyms are useful for two reasons: 1) they immediately disambiguate homonyms; and 2) They serve as the "name identifier" to allow many powerful informatics "tricks".

The publication in the link above is old, and there are some changes -- both in the terminology (e.g., what the article refers to as "Assertion" is what we now call "Taxon Name Usage"), and the data model (it has evolved somewhat, though the core entities of Agent, Reference, Assertion[TNU], Protonym all remain the same).

I will elaborate further when I document the data model.

nielsklazenga commented 5 years ago

@mdoering Yes, I think a lot of this is in the implementation (or up to the user) and it is probably more important to get the properties right than in what classes they go.

Really liked the CoL+ NAME document you supplied the link for. There are some changes regarding how Fungi names are handled (i.e. the sanctioning) are handled from the last two IBCs. Should I comment in the CoL+ repo.?

nielsklazenga commented 5 years ago

Thanks @deepreef . That is sort of what I had inferred from earlier discussions, but good to get confirmed that we are talking about the same thing.

mdoering commented 5 years ago

Really liked the CoL+ NAME document you supplied the link for. There are some changes regarding how Fungi names are handled (i.e. the sanctioning) are handled from the last two IBCs. Should I comment in the CoL+ repo.?

Thanks @nielsklazenga , please comment in the CoL+ repo, open an issue there or just send me an email. Any feedback highly appreciated!

tdwg / tnc

Document use of NomenclaturalNoteType properties #23