Name variants and the definition of tnu:TaxonomicName

camwebb commented 5 years ago

Following on from the conference call, and @deepreef’s comment that perhaps we still don’t have a precise definition for tnu:TaxonomicName, and that the issue has always been “kicked down the road”, and that we would do well to sort this out now, I’ll open this thread! Let’s also refer to @baskaufs’s figure which is getting some editing and comments.

Take the question: are “Delphinium brownii J. Smith” and “Delphinium browni J. Smith” one instance of tnu:TaxonomicName or two?

One opinion seemed to be that our common understanding of a name was of an abstract entity, and that a single name (with or without a protonym) might have a number of variants that are not of the same nature as the base name itself. We might model this as:

:TN1   :hasOrthographicVariant      "Delphinium brownii J. Smith" ,
                                    "Delphinium browni J. Smith" .

with usages perhaps modeled:

:TNU1  tnu:taxonomicNameUsageLabel  "Delphinium brownii J. Smith" ;
       tnu:taxonomicName            :TN1 .
:TNU2  tnu:taxonomicNameUsageLabel  "Delphinium browni J. Smith"  ;
       tnu:taxonomicName            :TN1 .

Another view is that since we often do not have a protonym, and we encounter every name via a name usage, we are better off modeling name variants thus:

:TNU3  tnu:taxonomicNameUsageLabel  "Delphinium brownii J. Smith" ;
:TNU4  tnu:taxonomicNameUsageLabel  "Delphinium browni J. Smith" .

and (I think I heard this correctly):

:TNU3  tnu:taxonomicName            :TN3 .
:TNU4  tnu:taxonomicName            :TN4 .
:TN3   tnu:taxonomicNameStringWithAuthor "Delphinium brownii J. Smith" .
:TN4   tnu:taxonomicNameStringWithAuthor "Delphinium browni J. Smith" .
:TN3   owl:differentFrom            :TN4 .

Perhaps I misunderstood though, and perhaps there are other ways people see this? I know this has probably been talked about, around and around for many years.

I’m particularly interested as I’m currently in the process of reconciling names between different DBs (IPNI, Tropicos, ThePlantList, Flora of North America; see here and here), and I often want to say that two variants refer to the “same” name, but don’t quite know how to define that “one name.” Having a crystal clear definition of tnu:TaxonomicName would be helpful.

mdoering commented 5 years ago

Getting a definition of a TaxonomicName would be great and is indeed highly needed. Some food for thoughts at CoL+.

The simplest solution and what I believe most people tend to think of a name is just the canonical string label, the uni/bi/trinomial. In order to distinguish homonyms you then start dealing with authorships. Then there are even fully duplicated name strings with the same authorship and publication year, but in different journals (see Pedicularis inconspicua example). One could argue that only one of the superflous names in the P. inconspicua example is nomenclaturally relevant so we do not need to distinguish them.

I doubt we want to see all possible lexical variations (author spelling, transliteration, epithet gender, additional infrageneric or infraspecific classifications) as unique names. Isn't just the subset of name usages which are nomenclatural acts the ones we are interested to identify and link to? This can be an orthographic variation or corrected gender. But they are published as corrections of some other name. As we want to track those relations they should probably be separate name instances.

I also wonder if we should look closer at existing nomenclatural sources and see if we can extract a definition from them. IPNI, ZooBank, Index Fungorum, Systema Diptera, Sherborne's Index Animalium, the name lists of the Appendices of the nomenclatural codes e.g. to conserve names - do they share a common understanding of what a name is or at least one per code?

A difficult task that seems simple first, but I strongly believe we need a rigid definition to at least share data. Maybe not so much to establish a data standard which could be used in different ways depending on the community.

nielsklazenga commented 5 years ago

I agree with most of what @mdoering writes, but we are not talking about definition of TaxonomicName here, but maybe a little about circumscription of its instances and mostly about matching and disambiguation of name data.

Variation in strings that are the same TaxonomicName and identical strings that are different TaxonomicNames, as well as the fact that very similar name strings can be different legitimate names, are of course problematic for data sharing and aggregation, but I don't think you can do anything about that by making the definition of TaxonomicName more rigid. Asking people to spell correctly and to provide sufficient disambiguating information (the nomenclatural code, for example) will be more useful (us asking them not so much as them actually doing what we ask).

Taking @camwebb 's example (and conveniently ignoring Cam said the two name strings are variations of the same name), if I were to assume that 'Delphinium brownii J.Sm.' is a botanical name – IPNI has Delphinium brownii Rhydb. and I don't know the thing, so I can't be sure – I would know that 'Delphinium browni' is a correctable spelling error, so, if I wanted to do something with 'Delphinium browni' and not just consider it a typo, I would have it as another label for the TaxonomicName instant of which Delphinium brownii is the scientific name.

If I would not want to make that assumption, or, if I were a computer, I couldn't make that assumption, or, still being a computer, even if I could match Delphinium brownii J.Sm.' to a botanical name, I wouldn't know that 'browni' is a correctable error, I would have to treat 'Delphinium brownii J. Smith' and 'Delphinium browni J. Smith' as different TaxonomicName instances.

I think that, when provided with sufficient data, we would probably agree which name strings belong to the same TaxonomicName instance and which ones are different TaxonomicName instances (so go with @camwebb's first model (replacing tnu:taxonomicNameUsageLabel with tnu:nameString)). However, I also think that, given two different name strings and nothing else, we'll have to treat them as different TaxonomicName instances. This has got nothing to do with the definition of a TaxonomicName and everything with data quality. There might also be scenarios where the serialisation only allows a one-to-one relationship between a name string and a TaxonomicName instance and people still want to account for lexical variation in names. And then there are scenarios where identical name strings need to be different TaxonomicName instances, because they have a different status (e.g. isonyms or names that are first invalidly published and validated in a later publication). Not everybody may be interested in those, but some people are and a standard needs to accommodate it.

Therefore, I think we should have a rather flexible definition of TaxonomicName and leave stricter definition, when required, to application profiles. See also the TDWG Vocabulary Maintenance Specification.

What I think we should get out of this though is that the standard does need to accommodate lexical variation within a TaxonomicName instance. This can be achieved by using SKOS Extended Labels, as @baskaufs has suggested (this can also deal with variations, or errors, in authorship strings). Also, a verbatim name string in the TaxonomicNameUsage class would be useful. The nameString property in the TDWG Taxon LSID Ontology could be adopted for that.

mdoering commented 5 years ago

Is a plain label good enough or do the variations need detailed name properties in which case we must treat them as individual name instances? How can you tell which label is preferred?

ghwhitbread commented 5 years ago

+1 individual name instances. They do have properties (nomenclaturalStatus at least) . and they are referenced by taxonomicNameUsages which in turn participate in relationships (orthographic-variant-of). Our editors even like to assign authorship to variants where provenance is known: e.g. 'Delphinium browni S. Other orth. var.' Their/our job is to document factual, nomenclatural and taxonomic usage ... building fundamental, scientific infrastructure.

baskaufs commented 5 years ago

One of the advantages of the SKOS-XL system is that it creates a way to assign identifiers to text strings that are labels. It also specifically states that even if two label instances have the same literal form, they are not necessarily the same individual. So two taxonomicName instances could have the same literal form (string), but be identified as being distinct labels, be assigned properties, and be designated as the same or different from other labels that have exactly the same literal form.

The other thing is that the label instances could have an existence of their own separate from the taxonomicName they serve as labels for. Thus, their status as a preferred label, alternate, or hidden labels for particular taxonomicNames could change. They could also begin their lives unassigned to a taxonomicName, be connected or disconnected from a particular taxonomicName, or be moved from one taxonomicName to another without losing their provenance and other metadata. I am not saying whether that would be a good idea or not, just saying that it would be possible. Instantiating label entities as separate things from taxonomicNames gives options for dealing with the kinds of variation that we are talking about here without the necessity of generating new taxonomicName instances for every variation of strings.

Of course, adding another layer to the model (a label layer) introduces more complexity to the model. But we are talking about how best to handle situations that are complicated. So it might be an appropriate solution.

nielsklazenga commented 5 years ago

@ghwhitbread, so clearly a matter of personal preference and not definition. I think I am one of those editors you are talking about and, while I can get very anal about certain examples people provide (sorry @camwebb), I am a lot more pragmatic as an editor than you suggest your editors are. By the way, when you say 'separate name instances', are you talking TaxonomicName instances or NSL Instances (which are closest to TaxonomicNameUsage instances)? If the latter, I agree; if the former, I don't think you can do that in the NSL. The NSL has a verbatim_name_string in the Instance table, which I think is the easiest and mostly preferable way to deal with lexical variation.

It's important to distinguish between orthographic variants that have a status in the relevant code – and therefore have to be separate TaxonomicName instances – and spelling and other orthographic errors. The only difference between 'Delphinium brownii' and 'Delphinium browni' is in the termination, so 'Delphinium browni' is not an orth. var., but at most a correctable error (it could also just be a printing error or typo). Chances are pretty good that 'Delphinium browni' is the original spelling and 'Delphinium brownii' the corrected spelling. If you recognise 'Delphinium browni J.Sm.' as a separate TaxonomicName instance (or Name record in the NSL), there will be no primary TaxonomicNameUsage for Delphinium brownii, as it is based on a rule in a nomenclatural code, so there is no reference or attribution (or it is irrelevant), and you've got no way to link the corrected and original spelling (and defeat the purpose of having a TaxonomicName object).

nielsklazenga commented 5 years ago

@mdoering In addition to @baskaufs, the SKOS-XL labels would be in addition to what we've already got, so, in addition to using the prefLabel property to indicate which label is preferred, you'll also still have dwc:scientificName and dwc:scientificNameAuthorship to do that. For example, if I were using this for name matching (I do some of it for the maps in VicFlora, which use AVH/ALA data), I would have the scientific name from my taxonomic backbone and a Label for every unique name string provided with the occurrence records that I have matched to it. Apart from an id and a literalForm, the Label would also have a matchType. I don't care so much whether the name string is correct or original, an orthographic variant or an error,, or how the authorship is composed (e.g. whether or not ex-authors are included or the IPNI standard form has been used), so I would hang every Label from the TaxonomicName instance with the altLabel property. This way, I only have to parse a name string and match it to a TaxonomicName once. Also, if a human were to go in and found a match to be wrong, it is easy to change and, since each unique string is matched to a TaxonomicName only once, future matches will be correct without having to change the matching algorithm.

nielsklazenga commented 5 years ago

Linking this to #25.

nielsklazenga commented 5 years ago

Example from my own work, in table form:

Taxonomic Name

id	scientificName	scientificNameAuthorship
1	Dicranoloma billarderii	(Brid.) Paris
2	Dicranoloma blumei	(Nees) Ren.
3	Dicranum billarderii	Brid.
4	Dicranum blumei	Nees

TaxonomicNameUsage

id	hasName	nameString	accordingToString	primary
5	3	Dicranum billarderii	Anon 1802:214	true
6	4	Dicranum blumii	Nees in Blume 1823: 131	true
7	1	Dicranoloma billardieri	Paris 1904: 24	true
8	2	Dicranoloma blumii	Renauld 1901: 69	true
9	1	Dicranoloma billardierei	Klazenga 1999: 60	false
10	1	Dicranoloma billarderi	Klazenga 2003: 435	false
11	2	Dicranoloma blumei	Klazenga 1999: 65	false

Darwin Core Taxon

taxonID	scientificNameID	scientificName	scientificNameAuthorship	namePublishedIn	nameAccordingTo	originalNameUsageID	originalNameUsage
9	1	Dicranoloma billarderii	(Brid.) Paris	Paris 1904: 24	Klazenga 1999: 60	5	Dicranum billarderii
10	1	Dicranoloma billarderii	(Brid.) Paris	Paris 1904: 24	Klazenga 2003: 435	5	Dicranum billarderii
11	2	Dicranoloma blumei	(Nees) Ren.	Renauld 1901: 69	Klazenga 1999: 65	6	Dicranum blumei

Note: Dicranoloma billarderii is also an interesting case to consider when we get to Taxonomic Name Usage Relationships, as Dicranoloma billarderii sec. Klazenga 1999 and Dicranoloma billarderii sec. Klazenga 2003 have different circumscriptions. In my treatment of the Australian and New Zealand species of Dicranoloma (the 2003 publication) I discovered that the Malesian specimens (which were the topic of my 1999 publication) and New Caledonian specimens do not belong to Dicranoloma billarderii, but to a distinct taxonomic entity, which I now call Dicranoloma deplanchei.

tdwg / tnc