ghwhitbread commented 6 years ago

You write: A taxonomic concept is a taxonomic name instance establishing or circumscribing a taxonomic entity - often linking synonymic inclusions and adding annotations, description…

I think it's cleaner to say that the taxonomic concept is a theory of a certain taxonomy identity. And then "taxonomic concept label" (name sec. source) is the "name" for that theory.

More or less like here: https://jbiomedsem.biomedcentral.com/articles/10.1186/s13326-017-0174-5 ...

Best, Nico

jgerbracht commented 6 years ago

I come at the definition of Taxonomic Concept from a different direction, which is from population perspectives, i.e. one or more population(s) of taxonomically related individuals. As opposed to approaching this from the taxonomic names direction. I see taxonomic names as labels to taxonomic concepts and while the names will change from author to author and taxonomy to taxonomy, the underlying concept must be unchanging through time (though, of course, individuals within population(s) are born, reproduce and die). 'A rose by any other name ...'

Denis Lepage et al. in https://dx.doi.org/10.3897/zookeys.420.7089 describe concepts and the issues though I don't immediately see an actual definition in his paper.

I do believe that our first task is to coin a definition that should include the immutability and disassociation of a concept with specific scientific names, author and publication of each concept. I can 'mostly agree with this definition "A taxonomic concept is the underlying meaning, or referential extension, of a scientific name as stated by a particular author in a particular publication. It represents the author’s full-blown view of how the name is supposed to reach out to objects in nature."
I'm not suggesting this as the definition we adopt but it comes close to what I mean when I say 'Taxonomic Concept' One key point is that I see this describing only the 'original definition' of the concept and when future authors apply different names to the exact same view of the 'objects in nature', the same taxonomic concept should be utilized.
Cheers, Jeff

jgerbracht commented 6 years ago

One thought that will certainly cause discussion and controversy is that from my perspective, a taxonomic concept doesn't actually require a name. At the most basic, it requires some form of reference (even unpublished) and description of the associated population(s). There are cases that eBird must manage every year where an unpublished species still is a valid taxonomic concept, it doesn't have a name, author or citation yet, but it is being observed and recorded in the natural world and therefore, most have a taxonomic concept ID for it to be useful in the eBird world. Thoughts?

deepreef commented 6 years ago

I actually don't find that controversial at all. When a tree falls in the woods and nobody is there to hear it, it still creates sound waves. Likewise, when groups of organisms are known to exist in nature but have not been formally assigned an Linnean-style scientific name, they still exist as taxon concepts.

As for the actual definition of "taxon concept", I would go with something simpler: "A circumscribed set of organisms asserted to represent a taxon". It's not circular, because the word "concept" is what we're trying to define here. A slightly more elaborated version might qualify "organisms" as "inclusive of individuals living, recently dead, and yet to be born, "

Technically, Code-governed Linnean-style names are labels attached to name-bearing type specimens. To use a Linnean-style name as a label for a concept, it's necessary to include a "sensu" or "SEC" qualifier, which by convention is some form of Reference citation (typically author + year). I agree that names shouldn't be part of the definition of a taxon/concept, and I wouldn't include this convention for labeling concepts as part of the definition of a concept either. Instead, I would keep the definition of "taxon"/"concept" as the simple version above, then strongly support a standard human-friendly concept labeling format along the lines of:

[LinneanStyleName] [NomenclaturalAuthority] SEC [ReferenceAssertingOrDefiningConcept]

As for identifiers, applied to names and concepts, my own views are well-documented: we need a system of persistent, shared identifiers for taxon-name usage instances, and then apply those identifiers in context as proxy anchor-points for both taxon names and taxon concepts. But I imagine that would be a different thread...

Aloha, Rich

mdoering commented 6 years ago

I would also go for a short definition with the focus on defining a set of organisms. "A circumscribed set of organisms asserted to represent a taxon" is pretty good. It makes me wonder what a taxon exactly is though. Does that need another definition? For example is a classification essential for a taxon and does a change in the classification change the concept? I would think it does not, but I know some think differently.

Markus

deepreef commented 6 years ago

Thanks, @mdoering . Yeah... I was a little queasy about that as well. My hope is that "Taxon" is reasonably well understood, given that it is the basis of an entire field of study (i.e., taxonomy). But you're right -- while my proposed definition may not be circular per se, it does somewhat dodge and obfuscate a clean definition by leaning too heavily on an equally abstract and vague term.

I suppose the definition could simply be "A circumscribed set of organisms", but there are other reasons for circumscribing sets of organisms that are non-taxonomic (e.g., "marine organisms", "organisms in Hawaii", etc.). That's why I felt the definition needed the additional refinement of "asserted to represent a taxon". I think as Jeff and others have said, the "asserted" part is key, because any taxon concept really inherits its meaning from an assertion put forth by taxonomists (or non-taxonomists). My sense of "taxon" is that the word implies a set of organisms that more or less share an evolutionary history. I wanted to avoid such specifics, however, to sidestep the whole monophyletic/holophyletic/paraphyletic issue which, while interesting in its own right, is outside the scope of what we're trying to achieve here.

So... my feeling is that the definition I proposed stands as it is even without a clear/agreed definition of what a "taxon" is. Different people may agree or disagree on what is implied by "taxon", but what matters to the definition is that someone asserted a set of organisms to represent a taxon -- by whatever notion of "taxon" that someone had in mind. Linneaus predated Darwin by a century and was himself a creationist; but I think it's fair to say that he asserted circumscribed set of organisms to represent taxa. In his mind, taxa were created by God, which is not consistent with the view of modern evolutionary biologists; yet despite this fundamental gap in the essence of a "taxon", both Linneaus and modern evolutionary biologists still assert circumscribed sets of organisms to represent taxa in ways that are fundamentally comparable, and fall within the scope of what I think we're circling around for defining what we mean by "taxon concept" in this context.

Sorry for the ramblings....

Aloha, Rich

P.S -- Sorry - I accidentally clicked the wrong button....

nielsklazenga commented 6 years ago

I like Rich's definition. We need to work out how Taxon, Taxon Concept, Name Usage and Instance relate to each other (I'll create a new issue for that tomorrow; it's in the discussion document that Greg and I wrote), but I would say that the Taxon is the actual group of organisms that is out there (or we think is out there), while the Taxon Concept is the abstraction, or what is in our heads.

deepreef commented 6 years ago

Thanks @nielsklazenga -- I agree with your distinction between "taxon" as being the actual set of organisms, and "concept" as being our abstract human interpretation of it. In that context, I would probably apply my proposed definition to "taxon", and parse out the other terms as follows:

Taxon: "A circumscribed set of organisms, inclusive of individuals living, recently dead, and yet to be born, asserted to represent a natural cohesive biological unit" [This may need some elaboration on "natural cohesive biological unit", but again the key is that in order to exist, it must asserted to be such.]

Taxon Concept: "A set of physical, genealogical, phylogenetic or other biological properties or characters of organisms used to define the abstract boundaries of a taxon circumscription that collectively distinguish it from other taxa." [What I'm trying to suggest here is that the "concept" is derived from the actual properties used to describe the abstract boundaries of taxon circumscriptions, which is the way that taxonomists determine whether any particular organism/individual is or is not an instance of an asserted Taxon.]

For my own understanding of "Taxon Name Usage" and associated terms (e.g., "Reference", "Name-String", "Appearance", etc, see: Taxonomic name usage files.

I'm not a big fan of defining the term "Instance" by itself within this context, because that word is so broad and vague that we shouldn't try to co-opt it to have a more specific meaning.

nielsklazenga commented 6 years ago

Awesome.

@deepreef, in terms of the relationship between Taxon Concept and Taxon Name Usage, would you agree that Taxon Name Usage can be an operationalisation of Taxon Concept?

deepreef commented 6 years ago

I guess my answer to that depends on what you mean by "operationalisation".

The way I have characterized it in the past, is that a "Taxon Name Usage" (TNU) encompasses all of the text, numbers, figures, data, etc. associated with the implied taxon concept asserted within a Reference. An identifier assigned to that TNU includes all of that associated information collectively as the "thing" that is identified. Thus, I guess I would say that the TNU identifier implies the full set of information used in asserting a Taxon Concept. In this sense, I think it's fine and appropriate to regard the TNU as the "operationalisation" of the Taxon Concept, in the sense that it encompasses all of the documented information used in the Reference to define the boundaries of that Taxon Concept.

One of the caveats, however, is that I think that a TNU can be used to operationalise more than just the Taxon Concept. For example, a subset of TNUs are Protonyms (i.e., those that create new scientific names, or "nomenclatural novelties"). In some contexts, the TNU (=Protonym) can also simultaneously be the operationalisation of the "taxon name" entity (important for nomenclators, but devoid of any connection to taxon concepts other than the name-bearing type specimen), as well as the operationalsation of the implied taxon concept associated with that name within that Reference (no different from any non-Protonym TNU).

I personally don't see a problem with that, because the distinction of whether or not a particular TNU identifier implies (or serves as proxy for) the nomenclatural bits of the TNU or the taxon concept bits of the TNU depends on the context in which the identifier is cited. The identifier identifies the TNU (i.e., the collective set of text, numbers, figures, data, etc. associated with the implied taxon concept asserted within a Reference); but the TNU serves as a very useful proxy for both nomenclatural actions, and taxon concept definitions.

Man, this stuff is hard enough to think about, let alone write about! And for those who argue that these sorts of discussions are too deep into the weeds to be useful in this context; I would counter that the reason we've been unable to solve these issues after decades of discussing and debating them is because we have thus far failed, as a community, to dive this deep into the weeds previously.

nielsklazenga commented 6 years ago

You are very good at writing about it though. I agree with all that. At a later stage we can probably come up with a list of types of Taxon Name Usages and how they relate to Taxon Names and Taxon Concepts.

I agree that it is important to have these discussions, as I think that, once we've nailed down the core concepts, the rest will become more straightforward.

jgerbracht commented 6 years ago

If we use Taxon as being "A circumscribed set of organisms, inclusive of individuals living, recently dead, and yet to be born, asserted to represent a natural cohesive biological unit" then a taxon_identifier would be an identifier that is persistent and always means the same 'circumscribed set of organisms' regardless of what taxonomic name is applied, what taxon authority is applied and what taxonomic level is applied. Isn't taxonomic id already utilized with and generally closely tied to a name? as opposed to a 'set of organisms'? Maybe I have a basic misunderstanding that can be corrected.

deepreef commented 6 years ago

Yes, that is my understanding conceptually. However, for practical purposes, I'm not sure how one would ever know that two circumscribed sets of organisms asserted by two different authorities (accordingTo), with the same or different names, and the same or different taxonomic levels, represent the same taxon concept (at least with enough confidence to utilize the same taxon_identifier). An example we wrestled with in the early days of discussing this is suppose you have Smith 1950 asserting a taxon concept, with various information delimiting the boundaries of that concept (e.g., characters, junior synonyms, geographic distributions, etc.). Then Jones 1980 uses the same name, same synonymy, but adds some additional characters (not mentioned by Smith), and perhaps adds a geographic range extension. Can we confidently assume that both are the same taxon concept, and therefore both can utilize or reference the same taxon_identifier? That would require expert knowledge of the group to assert, and even then what would be required for Smith herself and Jones himself to mutually agree that they are referring to the same implied circumscribed set of organisms?

This is why I never felt there was much practical value in creating taxon_identifiers that are independent of the underlying TNU(s) that assert the taxon concept(s). It's also why TCS went with the notion of "TaxonRelationshipAssertions". That is to say, while we may be able confidently document that Brown 2000 asserted that taxonConcept sensu Smith 1950 is congruent with taxonConcept sensu Smith 1980, we cannot "know" they actually are congruent with enough confidence that we can share the same identifiers for both concepts.

This is why I think anchoring everything to TNUs (rather than taxon_identifiers of some sort) is more practical, and instead of asserting concept congruence via shared taxon_identifiers, we assert some sort of set-theory relationship between the concepts represented by two TNUs (e.g., as congruent, or includes, or overlaps or whatever). Sure there may be some cases where we can universally accept congruence in taxon concept from separate TNUs with enough confidence that we could anchor both to the same taxon_identifier; but I wager such cases would represent the vast (VAST) minority, and in that context does it really make sense to define and maintain and utilize yet ANOTHER class of identifiers (in a domain that is already overflowing with subtly different classes of identifiers)?

On the other hand, if we lower the "bar" for what we accept as "congruent" concepts (e.g., sets of distinct name-bearing type specimens -- aka heterotypic/subjective synonomies), then we're in a much better place to aggregate sets of TNUs into congruent taxon concepts more objectively, in which case a dedicated class of taxon_identifier might well be useful.

Sorry for the extended ramblings...

mdoering commented 6 years ago

Thanks for raising this. For along time I wonder if we should differ between a NameUsage and a TaxonConcept.

In most cases when we talk about concepts we refer to a specific, published usage of a name - NAME sec. REFERENCE. What exactly the concept is, is not expressed at all and it is going to be hard to find properties that describe it. Is it worthwhile to differ between the attempt to list defined (and unique?) concepts and the simple referring to a name used in some publication? If it is only about the later I much prefer the term NameUsage which does not pretend to be more that just that.

Markus

jgerbracht commented 6 years ago

In the world of birds, this happens very frequently, i.e. where different authorities and even different versions within an authority have different name usages that apply to the exact same taxon concept and we can be very certain that they do in fact refer to the same concept.

It often seems that when a species is described, the concept exists (as discussed earlier) but the description of the concept does not always exist. A later authority will come along and describe the concept in more detail (maybe adding a geographic range), but I would argue that doesn't change the concept, only clarifies it.

I do recognize that we are fortunate in the bird world because concepts do not change very often and are fairly well known/agreed upon, though there have certainly been some surprises which require a new concept (even when names don't change).

I manage eBird and several online taxonomic monographs and having a taxonomic concept identifier that was static through time (as long is it refers to the same set of organisms) if very important as we manage 500 million observations and 4000+ species pages. Each observation or species page is keyed in the database to a taxon concept ID. And when a name changes, I can simply apply a new name to that concept as opposed to changing the impacted concepts.

Jeff

-- Jeff Gerbracht Lead Application Developer Neotropical Birds, eBird, Birds of North America Cornell Lab of Ornithology 607-254-2117

From: Richard L. Pyle notifications@github.com Sent: Friday, August 31, 2018 3:52:30 AM To: tdwg/tnc Cc: Jeff A. Gerbracht; Comment Subject: Re: [tdwg/tnc] Taxon, Taxon Concept and Taxon Name Usage: definitions and relationships (#1)

Yes, that is my understanding conceptually. However, for practical purposes, I'm not sure how one would ever know that two circumscribed sets of organisms asserted by two different authorities (accordingTo), with the same or different names, and the same or different taxonomic levels, represent the same taxon concept (at least with enough confidence to utilize the same taxon_identifier). An example we wrestled with in the early days of discussing this is suppose you have Smith 1950 asserting a taxon concept, with various information delimiting the boundaries of that concept (e.g., characters, junior synonyms, geographic distributions, etc.). Then Jones 1980 uses the same name, same synonymy, but adds some additional characters (not mentioned by Smith), and perhaps adds a geographic range extension. Can we confidently assume that both are the same taxon concept, and therefore both can utilize or reference the same taxon_identifier? That would require expert knowledge of the group to assert, and even then what would be required for Smith herself and Jones himself to mutually agree that they are referring to the same implied circumscribed set of organisms?

This is why I never felt there was much practical value in creating taxon_identifiers that are independent of the underlying TNU(s) that assert the taxon concept(s). It's also why TCS went with the notion of "TaxonRelationshipAssertions". That is to say, while we may be able confidently document that Brown 2000 asserted that taxonConcept sensu Smith 1950 is congruent with taxonConcept sensu Smith 1980, we cannot "know" they actually are congruent with enough confidence that we can share the same identifiers for both concepts.

This is why I think anchoring everything to TNUs (rather than taxon_identifiers of some sort) is more practical, and instead of asserting concept congruence via shared taxon_identifiers, we assert some sort of set-theory relationship between the concepts represented by two TNUs (e.g., as congruent, or includes, or overlaps or whatever). Sure there may be some cases where we can universally accept congruence in taxon concept from separate TNUs with enough confidence that we could anchor both to the same taxon_identifier; but I wager such cases would represent the vast (VAST) minority, and in that context does it really make sense to define and maintain and utilize yet ANOTHER class of identifiers (in a domain that is already overflowing with subtly different classes of identifiers)?

On the other hand, if we lower the "bar" for what we accept as "congruent" concepts (e.g., sets of distinct name-bearing type specimens -- aka heterotypic/subjective synonomies), then we're in a much better place to aggregate sets of TNUs into congruent taxon concepts more objectively, in which case a dedicated class of taxon_identifier might well be useful.

Sorry for the extended ramblings...

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/tdwg/tnc/issues/1#issuecomment-417583431, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AB3JSXoK0j6WL6NJaF1ppBzNnfzkDOspks5uWOs-gaJpZM4WNKqM.

deepreef commented 6 years ago

In reply to @mdoering: "Is it worthwhile to differ between the attempt to list defined (and unique?) concepts and the simple referring to a name used in some publication? If it is only about the later I much prefer the term NameUsage which does not pretend to be more that just that."

Short reply: I agree with your second sentence!

Longer reply: The way I look at it, NameUsage instances come in many flavors -- ranging from a mere mention of a name within a Reference to full-blown treatments with full synonymies, robust material examined and character descriptions, phylogentic analyses, geographic distributions, etc., etc. The degree to which one can divine the boundaries of a taxon concept circumscription will likewise vary tremendously as well. There may be some value in drawing a line between NameUsage instances that include a full heterotypic synonymy, and those that do not. The former can be used to algorithmically compare NameUsage instances as to the sets of type specimens they include and determine them to be congruent, includes, included in, etc. (per our discussions in Dave Remsen's house a couple years ago). While circumscription boundaries drawn using collective sets of type specimens (i.e., complete asserted heterotypic synonymies) are not as granular as those marked by character states and/or enumerated specimens & populations; they are FAR more practical in terms of determining (approximate) concept congruity. As such, we can do all the reasoning we need using only NameUsage instances, without the need to separately mint identifiers for Taxon Concepts as entities that exist independently of the individual usage instances.

I agree with @jgerbracht that the concept exists (at least in the abstract) independently of the extent to which it is described or fleshed out within the documented Name-Usage instance; but the problem as I mentioned before is that, beyond comparing heterotypic synonomies, expert knowledge is necessary to assert the congruency (or not) in concept circumscription between any two given name-usage instances. In cases where that expert knowledge is available, I think it's better to capture something along the lines of a TaxonRelationshipAssertion (sensu TCS) to map the relationship between two Name-Usage instances, rather than mint some sort of identifier for the abstract concept itself, then link both name usages to it.

Also, the precision and granularity of what the concept boundaries are will vary, and as such the decision to regard them as the "same" concept or "different" concepts will change. In some cases range extensions do not represent a change in concept, but in other cases they do. Take for example that Species A SEC Ref1 is described from specimens in Hawaii. Then someone finds a population in the Marshall Islands (range extension; Species A SEC Ref2). Because it's a range extension, there is no change in concept. However, later genetic data and other evidence convince someone else that they're actually different species, so we have Species A SEC Ref3 from Hawaii, and Species B Sec Ref3 from the Marshall Islands. Now... what is the relationship between Species A SEC Ref1 and Species A SEC Ref3? If the author of Ref1 (who was unaware of the Marshalls population) was a splitter, her concept might be sensu stricto and hence the same as Ref3. Or she might have been a lumper, in which case her concept would be sensu lato and congruent with Ref2.

I think it's much better to anchor our "concepts" as 1:1 with individual name-usage instances, then add a separate layer for assertions about how those concepts relate to each other in terms of congruency/etc.

@jgerbracht : one possible solution for what you describe is to establish a system analogous to type specimens but for Name-Usage instances that define taxon concepts. Instead of minting a new taxon_identifier to represent the concept (independent of the individual name usages that collectively define it) and linking all relevant TNUs to that separate identifier, you could (eBird could) have a system where they pick one TNU among several that relate to the same Concept, then brand that the "type TNU" for the concept, and link the other TNUs to it. This "Type TNU" effectively serves the same role as a taxon_identifier would, but without needing to deal with a new class of identifiers.

Think about it this way: even if we do mint taxon_identifiers to represent abstract concepts independent of the name usages, then you still need some reference point for that concept instance. Suppose there are four TNUs linked to the same concept instance, but then later someone realizes that a mistake was made, and that two of the TNUs refer to a slightly different concept than the other two. What happens to the concept instance? Does it disappear and two new ones are minted? Or does the original concept instance stay with two of the TNUs, and another concept instance is minted to represent the other two? What if three go with one concept, and one with the other? What if 49 go with one concept and 1 goes with the other? If we mint two new ones and "retire" the original concept, what happens to all the external data linked to that "retired" concept? If we maintain the original concept with one subset of TNUs and mint only one new one for the other, then there will need to be some mechanism for deciding which set of TNUs the original concept remains with (e.g., a "type usage" instance, analogous to a type specimen).

Again, I apologize for the long post here; but there's a reason we've never quite sorted all this stuff out before. The good news is that this conversation seems genuinely fresh to me, and I honestly think we're making good progress!

baskaufs commented 6 years ago

I'm a bit behind on this thread due to traveling at the end of the TDWG meeting. But I had several items that I wanted to add for the record.

Several years ago, there was a complaint that extensive, substantive conversations happen on email lists and that what comes out of those conversations does not get captured - causing the conversations to happen over and over. So I actually took the time to record a summary of the exhaustive TCS-related thread that started on 2012-11-01. Since we seem to be starting in on this subject all over again, with some of the same participants, perhaps we could start by reviewing the previous conversation and refer to the URLs of relevant posts there rather than writing them all over again. The page I've linked also refers to an earlier thread in 2009 that also repeats some of the same conversation about taxon concepts.
Niels posted info from an email I sent as Issue #3, so I won't repeat that here. However, I'd like to include it in this conversation by reference. What I wanted to note was that the graph diagram it includes came about during the creation of the Darwin Core RDF Guide. In writing the guide, we considered it out of scope to thrash out the issue of "taxonomic entities", assuming that such thrashing would be handled by a future TCS 2.0 task group (which I guess is pretty much this group). Nevertheless, Section 2.7.4 of the guide was written with the recognition that the dwc:Taxon class "convenience terms" effectively describe some kind of entity (an instance of the dwc:Taxon class that might be a taxon, taxon concept, or TNU). The RDF guide mints the object property dwciri:toTaxon to enable linking from a determination (dwc:Identification instance) to that entity at such future time when the nature of that "taxonomic entity" got fleshed out. I recommend reading section 2.7.4 if you want to understand how the RDF Guide sees the relationship between the DwC Taxon class terms and the dwc:Identification and dwc:Taxon classes themselves.
I understand the desire to clearly define what a taxon/taxon concept/TNU is. However, this discussion is reminding me of the very long discussion that took place when we tried to come up with a definition for the dwc:Organism class. Although it seems like it should be easy to define an organism, we ended up with a definition that may seem strange at first, since it included not only individual biological organisms, but also things like clones, colonies, and packs of animals. The reason that we ended up with such an odd definition is because we ended up defining the class in a way so that it "did" what we wanted it to do, rather than defining it to "be" what we thought it should be. Let me explain what I mean by that. In that long and painful discussion, the need for even having an organism class was questioned because there were very few properties that we actually wanted to assign to instances of the class. The epiphany came when it was suggested that the real purpose of the organism class was not to be a thing onto which we attached properties, but rather to be a thing to connect one-to-many determinations to one-to-many occurrences. In database terms, it was like a join table. In graph language, it served as a node to link multiple other nodes. Once it was clear that this was the function of the class, then defining dwc:Organism was easier: it was defined to include all things that can have one-to-many occurrrences and to which we would like to assign one-to-many determinations. That's how weird stuff like wolf packs got included in the definition. I think the situation of taxon/taxon concepts/TNUs is similar. What we need is a "thing" that connects identification instances to names and references. In graph terms, this thing is the a node that connects a determination to zero-to-one names and zero-to-one references. Anything that we can imagine to fulfill that role (taxon concepts, TNUs or whatever) can be included in the definition of that thing. Once we have established the "thing", we can assert additional properties to flesh out the meaning of the thing - taxon concepts might have properties that TNUs don't and vice versa, just as a wolf pack might have different properties than an aspen clone or an individual elephant. But the basic linking function will be there regardless. Given our previous experience, I highly recommend starting with a functional definition (we want this "thing" to connect references to names), rather than starting off by getting hung up on a conceptual definition.

It's possible that this node could also connect names to things like sets of specimens or organism occurrences rather than to a reference if that is an acceptable alternative way to define the taxon.

deepreef commented 6 years ago

Many thanks, @baskaufs ! Your post reminded me of our very animated discussions of "dwc:Organism", which in the end was, in my opinion, an extremely useful exercise. Evidently it was also successful, in that unlike this never-ending discussion about taxon (which parallels the never-ending debates about "What is a species?"), the "Organism" discussion seemed to come to a stable close (or perhaps no one cares enough about it to debate it anymore?)

In any case, I really like (and agree with) your point that "we ended up defining the class in a way so that it "did" what we wanted it to do, rather than defining it to "be" what we thought it should be." To be honest, I think that applies to the definitions of all of our terms (not just dwc:Organism). We like to think we're modelling nature as it is; but that's not what we're doing. We're modelling how to track information about nature in a way that makes it easier for us to answer the diverse set of questions we want to ask about it.

In that context, and having participated in the "taxon definition" discussions since the 1990's (the discussions began earlier than that), I actually feel that this discussion is making some novel progress, which I think is a good sign that we may be able to achieve some consensus in moving forward. Your post above made me realize why I think we're getting somewhere: in the past, the debate always got bogged down in "what IS a taxon?" (~= "What IS a species?") However, I think you captured a key point that I hadn't been able to put my finger on before, which is that we shouldn't spin our wheels endlessly trying to define what IS a taxon, and instead focus on how we want to define a taxon entity such that it fulfills our desires to answer the diverse set of questions we want to ask about nature.

We seem to have mostly stabilized on what a TNU is (and how its used). The outstanding question is whether there ought to be a separate entity (with a separate pool of identifiers) to represent a "Taxon Concept". The role such an entity/identifier would play is as an aggregator of TNUs that all represent the same circumscribed set of organisms. Similarly to "Organism", the "Concept" entity would not have many (any?) properties of its own, but rather would serve the function of linking clusters of TNUs together for the convenience of using one identifier to represent a collection of many TNUs.

In principle, I understand the value & simplicity of having such a defined entity (and corresponding identifier). In practice, though, I fear that it will end up as a hodgepodge of fuzzily-defined (to varying degrees) instances whereby different people will aggregate different sets of TNUs differently into concepts. The only way I can see it working effectively is via an additional "join" entity similar in many ways to Identifications for assertions about which TNUs map to which concepts (and that will start to get messy). The problem is that I'm not sure how effective that will be in helping us to answer the diverse set of questions we want to ask about nature.

Instead, I'd like to see us pin down the definition of TNU (and its various flavors, including Protonyms, Treatments, etc.), then flesh out a few million instances of them with their core properties (especially heterotypic synonym mapping), then allow the need for a "Concept" entity to emerge (or not) from that.

Again, sorry for the long diatribe...

baskaufs commented 6 years ago

Cool! After spending time looking at how other standards organizations work, I'm increasingly convinced that the effective way to work is to define the use cases first, then develop the standards while testing the proposed features of the standard against those use cases. That's basically what you've proposed - define what we would like for TNUs and taxon concepts to do, then try to build the system to make them work. Keep the features that work, discard the ones that don't. THEN write the standard describing how the features were successfully implemented.

deepreef commented 6 years ago

OK, then maybe one way to establish use cases is to enumerate some questions we would like to ask about organisms in nature, specifically related to taxa and their names (starting with the pedantic ones and moving on to more general ones):

Nomenclature In what publication was a scientific name first established? Is a scientific name available/validly published in the sense of the Code? Is a scientific name a homonym (either within a Code or across Codes)? What spelling variants have been used for a scientific name? What objective (Code-governed) synonyms exist for a scientific name? Where is the type specimen for a scientific name?

Taxonomy What other names has a scientific name been regarded as a subjective synonym of? What other names have been regarded as a subjective synonym of a given scientific name? Is a scientific name considered valid according to a specified Meta-Authority? What other names are considered as subjective synonyms of a scientific name/what other name is a scientific name considered a subjective synonym of, according to a specified Meta-Authority? How stable has the subjective synonymy or validity of a scientific name been over time? How do the circumscriptions of the same scientific name by two different authorities compare to each other? How many type specimens (and of what names) are included within a particular circumscription? What other circumscriptions are congruent with/include/are included in/overlap with a particular circumscription?

Classification What different genera has a species epithet been combined with? What parent taxon has a child taxon been included within? What child taxa have ever been included within a parent taxon? What child taxa are included within a parent taxon according to a specified Meta-Authority? How stable has the classification for a given taxon been over time?

Biodiversity What taxon name is an Organism (specimen/occurrence) currently identified as? What taxon names has an Organism ever been identified as? What is the currently accepted scientific name of a particular Organism, according to a specified Meta-Authority? What Organisms (with their respective occurrence metadata, such as locality, etc.) are currently identified to a scientific name or regarded as falling within a taxon circumscription, according to a specified Meta-Authority? [....]

OK, I got tired of writing these questions, but there are obviously many of these kinds of questions we would like to be able to answer.

In my mind, use cases involve sets of these questions to allow is to traverse from a given set of inputs to a given set of outputs.

For example, a use case might be: "Give me a list of all species and associated occurrences recorded for a given geographic region, including both the accepted name according to the most recent Catalog of Life, as well as the names that the occurrences are currently identified as."

Another might be: "For a given scientific name, let me know what homonyms exist, and for each homonym give me the current status and classification of the name according to different Meta-Authorities, a complete list of all names that have ever been regarded as a synonym (either junior or senior) as well as all known spelling variations and combinations."

To fulfill these use cases, we'd need to be able to answer several of the questions above.

I don't know if this is the right strategy to identify how best to proceed on this discussion and its desired outcome, but it seems to me that enumerating questions of this sort both builds the foundations for addressing Use Cases (or, perhaps, enumerating the Use Cases allow us to figure out what questions we need to answer to fulfill them), and allows us to be more specific about what entities we need to define, and what properties for each entity we need to capture.

Hoping that was at least somewhat helpful....

nfranz commented 6 years ago

Hi all. I'd like to be part of this, at some level. I'd also like to suggest that doing taxonomic concepts well is in an important sense a shift in value system, or value assignment. Technical definitions may be somewhat secondary, and agreeing on them is not necessarily critical to my mind. The value shift is this though: a commitment to taxonomic concepts is a commitment to support the process of systematic research/products, with particular emphasis on making the provisional, evolving, and frequently locally and temporally conflicting aspects of systematic inference and product use explicit, and indeed prioritizing software design and functions to showcase the provisional, evolving, and conflicting aspects of systematic inference making and usage. To the extent that this group can make such a commitment, I'd be excited to contribute.

ianengelbrecht commented 6 years ago

@baskaufs, thank you for your summary of the TCS discussion thread - really fantastic. Very helpful to be able to see that history. I strongly agreed that much valuable insight is often lost in the transience of internet forum and email discussions.

nfranz commented 6 years ago

I'd like to again point to this publication https://doi.org/10.1186/s13326-017-0174-5 which is on top of the thread. Please consider reading it in full. This is an ontology (proposal, if you will) that is also pilot-implemented here: http://openbiodiv.net/. It was part of a Ph.D. thesis, sponsored also by a biodiversity data publishing house, whose aims are well aligned with those of the TNC. It has a lengthy section "Domain Description" in which the issue of representing taxonomic concepts is tackled. I am not saying that there are no other important efforts, but if I had to point to a single most indicated descendent of the 2005 TCS, this just is it. I believe that if we take this paper and approach as a pragmatic foundation and begin to understand what services it can provide and which it cannot, we have a strategy to advance effectively.

deepreef commented 6 years ago

Many thanks for re-linking this publication, @nfranz! I thought I had clicked on your original link, but evidently not as this is the first I'm seeing the full publication. Although I do have some minor philosophical quibbles (e.g., I still fail to understand how a taxon concept can justifiably be called a "hypothesis", rather than an asserted opinion -- I don't agree with the arguments put forth about falsifiability), once I got past those I found the article to be very useful in framing the problem we're up against with this discussion. It's definitely worth carefully reading by anyone interested in this sort of stuff.

I do have a couple of technical questions that are most likely due to my ignorance of OpenData, (SPAR Ontologies, etc.; but I'm going to take a risk and ask them anyway. Perhaps you can help clarify these.

The article states that "Taxonomic Article is a subclass of FaBiO’s Journal Article". However, several other subclasses of FaBiO's Expression class (e.g., books, chapters,, etc.) also contain taxonomic treatments. Is this a problem for implementation, or are we only interested in treatments that appear in articles, or...?

The article states "In OpenBiodiv-O, a taxonomic name usage is the mentioning of a taxonomic name in the text, optionally followed by a taxonomic status." If a name is mentioned several times within a single treatment, does that represent more than one TNU sensu OpenBiodiv-O? Or are they collectively contained within a signe TNU (e.g., represented by the NomenclatureHeading)? The reason I ask is that there is a subtle but important distinction between a TNU (which encompasses the entire treatment in cases where the TNU is the NomenclatureHeading), and what James Ytow referred to as "Appearances" (individual mentions of name-strings, often with abbreviated genus), which may appear many times within the context of a single TNU. I ask because, in the paragraph that follows ('For example, “Heser stoevi Deltschev 2016, sp. n.” is a taxonomic name usage.'), it seems that the TNU is the raw text string, not the Treatment as a whole, in which case the definition of TNU as asserted in the context of OpenBiodiv-O is a significant departure from how it has been defined elsewhere.

An important aspect of TNUs is that there is generally a 1:1 correspondence between a Treatment and the TNU representing the NomenclatureHeading for the Treatment. However, as implied by Figure 1 of the article, a treatment often contains other TNUs (e.g. within the NomenclatureCitationList). Thus, while every Treatment has exactly one corresponding TNU, not all TNUs are treatments.

I very-much like the way that "TaxonomicConceptLabel" (TCL) is defined. However, I'm not entirely sure I understand why the need for establishing OperationalTaxonomicUnit as a super class of TaxonomicConcept. In my mind, Taxonomic Concepts represent a circumscription of organisms, regardless of whether that circumscription happens to include a specimen (or more than one specimen, when heterotypic synonymy is involved) designated as a name-bearing type for a Linnean-style taxonomic name (i.e., regardless of whether the concept has a formal scientific name to label it with). Can you provide examples of instances of OperationalTaxonomicUnit that would not be regarded as instances of TaxonomicConcept? I.e., what other subclasses of OperationalTaxonomicUnit are there, and what function do they serve?

Regarding the two patterns, replacement name and related name, is the former a susbset of the latter? Or are these mutually exclusive? It seems that replacement name implies congruence of concept/circumscription, whereas related name could apply to all five RCC-5 relations (or only the other four, excluding congruence), or...?

Sorry for the long post -- just trying to make sure I understand the contents of and assertions in the paper correctly.

rdmpage commented 6 years ago

I may live to regret this, but can I suggest another way of tackling this topic? I'm going try and be disciplined and avoid a WTF rant, and instead sketch out a way I think we can create something simple, and which might lead to some tools that people might find useful. I'm a fan of keeping things simple, reusing things, and trying to take into account what is going on elsewhere. For example, the http://schema.org vocabulary is gaining momentum, and covers a lot of things we care about (publications, people, places, etc.). I make extensive use of it in my latest toy https://ozymandias-demo.herokuapp.com.

Interestingly, there is a community project to extend http://schema.org to include more life-science specific entities BioSchemas (a number of people on this list will be aware of this already). So it seems to me there's a case to be made for avoiding domain-specific vocabularies as much as possible, and trying to make our stuff as interoperable with the wider world as we can.

Taxa

I regard taxa as nodes in a tree. What a taxon "is" is defined by its place in that tree (although identifiers don't change if the composition changes, that way lies madness). A taxon in NCBI is ultimately all the organisms that yielded the sequences in the subtree rooted at that node. A taxon in GBIF is ultimately all the occurrences in the subtree rooted at that node.

There's a proposal by @frmichel for taxa in BioSchemas](https://github.com/BioSchemas/specifications/tree/master/Taxon). This seems pretty straightforward and uses terms that will be familiar. If we use this for taxa (i.e., nodes in a classification) then we have a simple vocabulary that anyone can use, from people working in genomics with the NCBI taxonomy, to people building little taxon-specific web sites and who want to increase their visibility to Google by including structured markup (the primary driver behind schema.org).

Lots of people care about taxa, let's give them a simple way to talk about them.

Names, usages, etc.

It seems to me that the core idea here is the pair ('a name string', 'a bibliographic locator'). The bibliographic locator can be at the level of a "work" (e.g., an article or book), in which case a identifier like a DOI is the obvious candidate. If we want metadata, the schema.org has terms to cover pretty much any aspect of an article or other publication.

If we want more granularity, then the W3C Web Annotation Data Model covers pretty much everything, see https://www.w3.org/TR/2017/REC-annotation-model-20170223/#selectors. So we can refer to whole work, individual pages, XPath fragments in an XML document from, say, Pensoft, regions on a scanned page, etc. A further advantage of this is that tools such as hypothes.is use these selectors to locate annotations, and many academic publishers are adopting hypothes.is as their annotation tool.

So, nomenclators are essentially lists of annotations (think of IPNI where each record is basically a name and a page location). Treating "usages" as annotations makes it easy to integrate projects such as BHL - indexing all the pages for names, record their locations as annotations, flag those annotations that have some special significance (e.g., the first publication of a name). Imagine developing a tool that overlays BHL (or any literature database) and says "here the the names on this page, and by the way this is where this species name was published".

Some people care about names, many more people care about searching for information anchored to a name, use one to drive the other, and use a model that can handle both automatic text indexing as well as manual annotation. Name usages are basically annotations. The LSIDs in databases such as IPNI, Index Fungorum, ZooBank, and ION are identifiers for annotations (not "names" as such). It seems to me that name usages in the National Species Lists (NSL) are essentially annotations (with rather a lot of administrative cruff attached)

Taxonomic concepts

This seems to be the third-rail of this discussion. I'd argue that few people care about this topic, despite the acres of space devoted to it. The reason for that is that most people use whatever taxonomic classification is available to navigate the data they care about (e.g., the NCBI taxonomy if you work with sequences), and a taxonomic classification is essentially also a taxonomic concept (arguably they are the only concepts that are actually defined in any operational way). So, as a user, most people don't care. The proof of this is that science gets done without taxonomic concepts (we can argue about whether that's a good thing or not).

The one version of taxonomic concept that seems tractable is the "accordingTo" idea, in other words if I'm writing a paper I can say "when I use this name I mean this". This could be something as simple as saying "subgenus Stegomyia NCBI:53541" for NCBI's view of mosquito taxonomy. If I want to refer to a different concept of what Stegomyia is (and this is a very touchy subject in mosquito taxonomy) I could cite another work, in other words (Stegomyia, DOI:xxxxx). So, a taxonomic concept is a set of one or more (name, bibliographic locator) pairs. Hence, we just need a way to represent a set (or ordered list if we think of it as a list of synonyms), and schema.org has ways to represent those.

So, in its simplest form, the NCBI taxonomic concept of Stegomyia is (Stegomyia, NCBI:53541) (i.e., itself). I think this is the model also used by the Australian Faunal Directly where the authority for each taxon in the AFD classification is, of course, the AFD. We could expand the concept by listing all the synonyms, to make it more useful. If I understand the NSL model correctly, they link each node in their classification to a (name, reference) pair that corresponds to the concept in the tree.

People who care about taxonomic concepts (e.g., doing taxonomy, building classifications and trying to make sense of the literature) can describe these concepts as sets of (name,reference) pairs, which seems to me to be pretty much what taxonomists actually do.

Summary

I don't claim much originally here, and may well have completely misunderstood the discussion. But it seems to me there's a chance to adopt a simple, workable approach that builds on existing projects that have traction (e.g., schema.org, the W3C annotation model, bioschemas?) and hence get to the point where we, you know, build stuff that people want and need.

deepreef commented 6 years ago

Thanks, Rod – this is very good stuff. I’m on a ship with extremely limited internet access, so a more detailed reply will need to come later (if at all – lots of stuff keeping me busy when I get home).

Verify briefly:

Taxa as Nodes on a tree: I think this is fine, and is one of a number of ways the word “taxa”/”taxon” has been used, and it’s certainly a “thing” many people care about. I have n problem fixing the word “taxa”/”taxon” to nodes on a tree, rather than something else. But I’m not sure that works for how this word is/has been used in the sense of Darwin Core.
Yes, usages are pairing of a name and a reference. Identifying the reference with DOIs is great, as long as someone does them for all the historical references that do not already have DOIs and for new publications that don’t already have DOIs. But the “reference” part of the pairing has always been easy. The “name” part is the hard part. The simple approach is just use the literal string of characters to represent the “name” part. That’s the approach that most people have done for most of the history of trying to track this stuff. That’s the approach that created the current mess. To quote you, “that way lies madness”. So the hard part is capturing a name “entity” (or as I have always called it, a “name object”. Also, usages don’t really map well to individual pages. They map to Treatments, which typically span several pages. But that’s not really the problem. That said, I think you captured it perfectly: “Some people care about names, many more people care about searching for information anchored to a name, use one to drive the other, and use a model that can handle both automatic text indexing as well as manual annotation.” We just need to figure out what we mean by “name”.
I think “usages as annotations” is a legitimate way to frame it (ultimately everything can be thought of as an annotation, depending on what you’re most interested in). ZooBank identifiers are explicitly NOT identifiers for names – they identifiers for nomenclatural acts (which are a subset of usages). I can’t speak for IPNI, IF, ION, etc., but I think life would be a lot simpler if we did NOT treat these as identifiers for “names”. And in that context, treating them as identifiers for annotations makes sense.
If I understand you correctly on the Taxon Concepts stuff, then we are in complete agreement. And once you adopt the position that the most practical way to handle concepts is the “accordingTo” approach, you (should) realize that taxon concept is best represented by a usage instance (or set of usage instances).

OK, it turns out we’re the last boat to launch this morning, so I had some extra time to write the above. Therefore, this is the more detailed reply.

Aloha,

Rich

jgerbracht commented 6 years ago

Conceptually, I agree with most of what Richard and Rod describe, taxa are nodes on a tree, though what happens when the tree branches are completely rearranged and/or there are multiple trees made up of the same branches but in a different arrangement (as currently is the case with birds). These are the scenarios that I think the Taxonomic Concept or Name accordingTo really helps to organize accurately, especially for any data aggregator, be it GBIF, EOL, Wikipedia or a researcher bringing together data on the same Taxonomic Concept from different domains.

A clarification on "What a taxon "is" is defined by its place in that tree (although identifiers don't change if the composition changes, that way lies madness)." A taxon, or Taxonomic Concept isn't defined so much by it's place on the tree, but what branches and leaves are under that node. And I want to make sure I understand the "although identifiers don't change if the composition changes, that way lies madness" statement. If you are saying that a node has ID 123 and if branches under that node get added or removed, the node ID should still remain as ID 123, than I would agree, Madness!!

The reason I think an ID is needed to identify each Taxonomic Concept as opposed to a Name accordingTo, is that with the ID, users of these data don't need to go through the mapping exercise of their Name with Names from other providers. All instances of Name accordingTo would have the same Taxonomic Concept ID, so that the ID can be used to aggregate data. If there is one thing I've learned, the harder it is to aggregate data, the less likely it is to be aggregated by the users. I'm really thinking of this from the end user perspective, if we don't make that part simple, it won't be used.
Jeff

rdmpage commented 6 years ago

@deepreef I guess I was thinking of names as name strings to keep things simple, not sure what advantage treating them as objects gives us (or put another way, can we not use attributes of and/or relationships between annotation objects to express anything more complex than a name as a string of characters?)

@jgerbracht “If you are saying that a node has ID 123 and if branches under that node get added or removed, the node ID should still remain as ID 123, than I would agree, Madness!!”. That is exactly what I’m saying. NCBI node ids stay the same even if their contents change. If it were otherwise much of the bioinformatics community would collapse in a heap of unlinked content. It seems to me that changing identifiers in this context is equivalent to changing the URL for a web page everytime the content changes. You could argue that it is not the same web page, but that’s going to be small comfort to those of us hit with 404 errors when we try and visit the URL we’ve bookmarked. Regardless of whether Homo sapiens is just modern humans, or includes Neanderthals as a subspecies, its NCBI taxon id is 9606.

Anyway, the issue of whether node identifiers change if their contents change is independent of the idea of treating taxa as simply nodes in a tree.

To clarify the “nameAccordingTo” what I had in mind was that this property would be the identifier for a (name, bibliographic locator) pair that could be used to state what taxonomic concept was (in other words, “node x in my classification treats Stegomyia as a subgenus of Aedes, following this paper”). If you wanted to compare two classifications, you could compare identifiers for the (name,reference) pair, if they match, we’re talking about the same thing, if not, then we may or may not be.

Get Outlook for iOShttps://aka.ms/o0ukef

From: jgerbracht notifications@github.com Sent: Friday, September 14, 2018 9:28:22 PM To: tdwg/tnc Cc: Roderic Page; Comment Subject: Re: [tdwg/tnc] Taxon, Taxon Concept and Taxon Name Usage: definitions and relationships (#1)

Conceptually, I agree with most of what Richard and Rod describe, taxa are nodes on a tree, though what happens when the tree branches are completely rearranged and/or there are multiple trees made up of the same branches but in a different arrangement (as currently is the case with birds). These are the scenarios that I think the Taxonomic Concept or Name accordingTo really helps to organize accurately, especially for any data aggregator, be it GBIF, EOL, Wikipedia or a researcher bringing together data on the same Taxonomic Concept from different domains.

A clarification on "What a taxon "is" is defined by its place in that tree (although identifiers don't change if the composition changes, that way lies madness)." A taxon, or Taxonomic Concept isn't defined so much by it's place on the tree, but what branches and leaves are under that node. And I want to make sure I understand the "although identifiers don't change if the composition changes, that way lies madness" statement. If you are saying that a node has ID 123 and if branches under that node get added or removed, the node ID should still remain as ID 123, than I would agree, Madness!!

The reason I think an ID is needed to identify each Taxonomic Concept as opposed to a Name accordingTo, is that with the ID, users of these data don't need to go through the mapping exercise of their Name with Names from other providers. All instances of Name accordingTo would have the same Taxonomic Concept ID, so that the ID can be used to aggregate data. If there is one thing I've learned, the harder it is to aggregate data, the less likely it is to be aggregated by the users. I'm really thinking of this from the end user perspective, if we don't make that part simple, it won't be used. Jeff

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/tdwg/tnc/issues/1#issuecomment-421475454, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AAFFakmCACz3YQRqcYmXjr0QKHLZ0p9vks5ubBFmgaJpZM4WNKqM.

ghwhitbread commented 6 years ago

Now we are getting somewhere. Thank you @rdmpage for the nice description of the model behind NSL and the opportunity this pattern might provide for standardization of the primitives that can be used to deconstruct many of the models in our domain. Especially if we can do this engaged with some mainstream web standards such as schema.org, the Web Annotation Model and Linked Open Data. These are the very reasons we have tried to resurrect the TNC Interest Group and why we can imagine that it might just work. With a basic workable vocabulary, and maybe some classes and some attributes.

Though (in agreement): Our model of taxa is one of nodes, leaves and branches, rather than just nodes. Our taxon needs to be a reusable object that can be shared among many different classifications and their versions. And the identifier for this taxon needs to change when its contents (children or TNU) change so we can know that branches on different trees are identical if their identifiers are the same, and wherever that are reused. Within an NSL classification, names will uniquely identify a node, and they are persistent and stable across all objects and their revisions. So, even though your taxon model differs from ours, the “annotation” pattern ( I would not have called it that before today:) ) makes it possible for the NSL to expose an API that also satisfies your taxon.identifier needs. Remembering, of course, that the NSL development is still a work in progress.

Aside: Within TNC, nameAccordingTo has always been a (name, bibliographic locator) pair. ... oh, and “administrative cuff”, is probably what APNI does, annotating annotations

rdmpage commented 6 years ago

@ghwhitbread Regarding changing identifiers I guess we’re talking about is versioning, which seems to be a separate question. Some people may chose to change identifiers for nodes in a tree if the tree changes (see below), others may keep them the same. Let’s keep those two things separate.

On versioning I am a sceptic for several reasons:

to a first approximation nobody uses versions (GenBank sequences are versioned, has anybody seen anyone cite a particular version of a sequence?). So I’d argue versions are something providers care about and users by and large don’t (or at best they pay lip service too). Does anyone think that https://bie.ala.org.au/species/urn:lsid:biodiversity.org.au:afd.taxon:1799afcc-e95c-4037-af25-12db1fc7d8a2#names is useful to users? To me versioning is a form of bikeshedding and the quickest way to derail a discussion on identifiers.
how do you define a version of a tree? The extent to which two trees change depends very much on how you interpret that tree. If a species moves from one family to another, then do you treat that as an editing move, in which case just the immediately affected nodes change, or do you treat all nodes on the path between the old and the new placement as having changed (which they have if you define nodes by composition of their subtrees)? And who decides whether that change is meaningful? If that change has no affect on the data that I’m storing, but all the identifiers I used for those taxa are now changed, I’m going to get annoyed, because somebody else has broken my database (this has caused me much greif dealing with ALA data).
if you’re going to support versions then it seems to me that the user-friendly way to do this is to have a stable identifier that always points to the latest version so that people’s links won’t break (as do software distributions where there is a “latest” link), or have an identifier that points to the collection of versions (this is how Zenodo does it, for example https://blog.zenodo.org/2017/05/30/doi-versioning-launched/ ) then give them identifiers for individual versions if they want the ability to link to those.

Obviously people will have different opinions on versioning depending on their experiences and goals, so maybe the trick is to clearly separate the issue of representing information on taxa from the issue of how to have identifiers for (potentially) changing objects, then list strategies for handling that.

From: Greg Whitbread notifications@github.com Sent: Saturday, September 15, 2018 3:10:30 AM To: tdwg/tnc Cc: Roderic Page; Mention Subject: Re: [tdwg/tnc] Taxon, Taxon Concept and Taxon Name Usage: definitions and relationships (#1)

Now we are getting somewhere. Thank you @rdmpagehttps://github.com/rdmpage for the nice description of the model behind NSL and the opportunity this pattern might provide for standardization of the primitives that can be used to deconstruct many of the models in our domain. Especially if we can do this engaged with some mainstream web standards such as schema.org, the Web Annotation Model and Linked Open Data. These are the very reasons we have tried to resurrect the TNC Interest Group and why we can imagine that it might just work. With a basic workable vocabulary, and maybe some classes and some attributes.

Though (in agreement): Our model of taxa is one of nodes, leaves and branches, rather than just nodes. Our taxon needs to be a reusable object that can be shared among many different classifications and their versions. And the identifier for this taxon needs to change when its contents (children or TNU) change so we can know that branches on different trees are identical if their identifiers are the same, and wherever that are reused. Within an NSL classification, names will uniquely identify a node, and they are persistent and stable across all objects and their revisions. So, even though your taxon model differs from ours, the “annotation” pattern ( I would not have called it that before today:) ) makes it possible for the NSL to expose an API that also satisfies your taxon.identifier needs. Remembering, of course, that the NSL development is still a work in progress.

Aside: Within TNC, nameAccordingTo has always been a (name, bibliographic locator) pair. ... oh, and “administrative cuff”, is probably what APNI does, annotating annotations

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/tdwg/tnc/issues/1#issuecomment-421523379, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AAFFagT9X3vxcF_u-ssfNM0lAy8uuuluks5ubGGWgaJpZM4WNKqM.

nielsklazenga commented 6 years ago

@ghwhitbread is not talking about versioning of trees, but is simply saying that different nodes, through time, have different identifiers. In NSL the node identifier is used as the identifier for the taxon, which is the branch for which the node is the root node, so it is just saying that different taxa have different identifiers.

Creating a new identifier for the branch when its composition changes is the only way the branch can be reused in other trees or other versions of the tree with people still knowing that we are talking about the same thing: @ghwhitbread's reusable elements. Immutable node ids only work (I think) when there is only one tree that doesn't change (or when you ignore changes), in which case there seems to be little point talking about taxa and taxon concepts. If you want this kind of "stable" identifier, you can just use the identifier for the name object (that's why we treat them as objects rather than strings).

rdmpage commented 6 years ago

@nielsklazenga Just to be clear, so you're arguing that the model that databases like NCBI and GBIF currently follow, namely nodes in a tree have stable identifiers that don't change, even if we add and remove taxa, is wrong?

Perhaps part of the issue here is that I'm coming at this from the perspective of someone who wants to take a classification and the names and link stuff together. Hence I value stability of identifiers. If the target audience is developers of databases to track taxonomic change, then the goals may be rather different. I'd be happier if there was an explicit notion of what "change" is and what change is sufficient to merit a new identifier. As an end user obviously I'd want the minimum possible disruption.

But I hope we can separate out the issue of whether identifiers change from the vocabulary used to represent taxa and names, and be open to alternative strategies for handling node identifiers in the face of change.

rdmpage commented 6 years ago

Just to remind myself about iNaturalist's approach to splitting taxa and minting new identifiers: https://www.inaturalist.org/taxon_splits/18099

deepreef commented 6 years ago

Using name strings as the "thing" for names is why we've made so little progress these past decades. We've already gotten as far as we can working with "names as strings". We can only move forward with all the things that taxonomists/biologists/people want if we define names as abstract entities, with persistent identifiers, and some key properties (of which strings are only one). This is not because most taxonomists/biologists/people care about the subtleties of names as abstract entities with key properties (they don't). It's because you need those defined abstract things in order to make the machine behind the scenes work. 99.99% of people don't care what the IP address of a website is, but without that IP address hidden in the background, the internet wouldn't work as well.

I agree with the sentiments of @rdmpage with respect to versioning. In my mind, versioning is something useful for audit trails where the "essence" of a thing doesn't change, but we want to examine the histories of how its properties have been edited over time. People trend to care about this at the same rate they care about IP addresses.

However, I'm not sure that changing the contents of what is implied by a node in a tree falls into the category of versioning in this sense. If probably matters to most people whether the node for "Reptilia" includes, or doesn't include, the things we call birds. Using the same identifier for this node regardless of the implied included taxa seems like it would be an identifier with limited utility.

nfranz commented 6 years ago

Three brief points.

Above, I mentioned the issue of agreeing on a social commitment (look for "value shift"). We are at that stage I think.
Jonathan Rees on https://github.com/OpenTreeOfLife/reference-taxonomy/wiki/Node-equivalence-under-annotation:--examples
Example of a taxonomic revision following a taxonomic revision, the latter partially reusing the former's taxonomic concept labels (see Figs. 32-34). Due out in PeerJ in less than a month from now. https://doi.org/10.1101/383091

rdmpage commented 6 years ago

@nfranz

Thanks for the link to @jar398 notes on node equivalence, this looks useful.

Re "value shift"

The value shift is this though: a commitment to taxonomic concepts is a commitment to support the process of systematic research/products, with particular emphasis on making the provisional, evolving, and frequently locally and temporally conflicting aspects of systematic inference and product use explicit, and indeed prioritizing software design and functions to showcase the provisional, evolving, and conflicting aspects of systematic inference making and usage.

I guess I see this as somewhat inward focussing, with emphasis is on the people generating the knowledge rather than the people who will be using it. We can showcase change by simply doing a visual diff of trees over time, indeed we could have an approach that mimicked what GitHub does with diffs linked to commits. I'd argue that the primary focus should be those who are going to use biodiversity information, not those who create it. But maybe I've misunderstood who the users of taxonomic concepts are intended to be?

mdoering commented 6 years ago

This is a brilliant discussion I need to digest more in depth. Still some quick remarks as the notion of a taxon being a node in a tree is appealing but falls too short in my eyes.

Most taxonomies I have seen actually have name identifiers, not identifiers for taxa. A node in the GBIF backbone does not change its id when it changes its place in the tree or when a new child is added. It is purely the name that makes up the stable property. This is what most taxonomies do and this is useful for simple and stable linking across sources as in most cases you just have plain names. But for many links this is inaccurate.

The overwhelming use cases I have seen want to make sure we talk about the same set of organisms if we refer to the same taxon. It usually does not matter much if you add new, formerly unknown organisms to an existing group as long as you do not alter the boundaries to other sibling taxa. But splits or merges like iNaturalist tracks does matter a great deal. With this set of included organisms view it also does not matter where in the tree you have placed a taxon, even though biologically it should as the classification likely says something about the inherited biological properties. I very much like Nicos orchid example to illustrate that: https://goo.gl/hCDYbb

Reading Rods initial post again I simply might fail to understand what a taxon is if it is distinct from a taxonomic concept. Is it a concept placed in the tree? So far I have to admit I regarded taxon and taxonomic concept rather synonymous. Also if you read existing definitions for a taxon it is usually the set of included organisms and not their classification: https://en.wikipedia.org/wiki/Taxon

Markus

nielsklazenga commented 6 years ago

Just responding to @rdmpage 's question:

No, I am not saying that NCBI and GBIF are wrong, just that their approach wouldn't work for NSL, which has an entirely different purpose. And also probably that the identifiers in NCBI and GBIF are not identifiers for nodes in a tree, but for names (@mdoering made a similar point just above).

Each node in an NSL tree comes with three different identifiers (which, incidentally, is the same number of identifiers you can deliver in Darwin Core): the identifier for the Node, the identifier for the Name Instance (≈ Taxon Name Usage) and the identifier for the Name. All these identifiers uniquely identify the node in the context of a classification (tree). So, consumers of the NSL classifications, like the ALA, who use the APC and AusMoss classifications from NSL, can choose the identifier that has the properties they are after, if they are not minting identifiers of their own. For the ALA, I would argue, that is the Name identifier.

ghwhitbread commented 6 years ago

The NSL doesn’t version the data objects we use to model taxa, or taxonomic names, or name instances, or usage. All identifiers are persistent and actionable, always resolving to the same (with editorial licence), equally persistent resources. NSL classifications (<-plural) do change, but only in their entirety, as published revisions, to keep pace with both editorial progress and changing taxonomies. No identifier changes: names, instances, and references are simply facts, and nodes and branches remain even when they no longer participate in a current taxonomy. These are genuine requirements of the NSL as research infrastructure and Linked Open Data.

If your thing is trees and aggregation and your need persistent access to a future view then node/branch identifiers are probably not for you, nor concept identifiers, which are by their nature subject to succession in any taxonomy. The name, however, will persist into the future, where either accepted or in synonymy, it will be treated. What you will need is the node’s name identifier and a service that will deliver you via inference to the appropriate node in a current tree, or elsewhere if that’s your preference.

So what is this name object that requires an identifier, because names obviously are a “thing”. They are many things to many people, often even synonymous with taxa. As strings, they are a mess, as taxonomic names more ordered though still ambiguous, as taxonomic name usages context provides a means for disambiguation, as annotated usage and within a taxonomy they tend to be unique, and according to our Codes, they must be so. Names sec. taxon concepts are essential to the practice of taxonomy and perfectly natural nodes for the knowledge graph supporting development and use of our nomenclatural and taxonomic infrastructure. But outside, in the real world, they are almost impossible to apply appropriately, less so correctly, and our insistence on overstating their importance has brought them down into the mess that was once just names. The thing is, that if you think you need a taxon concept, you are probably making one, or at least documenting one that already exists, and, though platitudes abound, for real-world usage, a disambiguated name has to be the more practical alternative.

And where is our standard for names ... Bisby (1989), pre-TCS discussion (2008), GNA, Darwin Core? I hope this is why we are here. I think we agree (@rdmpage and @deepreef above) that we should put all of this aside and get down to basics, with a vocabulary that might help us link at least some of it together.

deepreef commented 6 years ago

Excellent, @ghwhitbread! I think you captured my own sentiments far better than I have been able to do. However, I need to ask you for some clarification. After describing the virtues of taxonomic name usages above, you follow with "But outside, in the real world, they are almost impossible to apply appropriately, less so correctly, and our insistence on overstating their importance has brought them down into the mess that was once just names." Is the "they"/"them" in this sentence TNUs? If so, the only thing about this "impossible in the real world" statement that resonates is trying to do second-tier reasoning where if Author X mentions Taxon name T (invariably without mentioning a "sec"), then it's hard/impossible to explicitly pin down that level of cross-reference. But that's a minor issue in my view, because that sort of reasoning is something that simply cannot be extracted from historical literature with an sort of reliabilty, so I see that as largely a lost cause. More useful, I think, is to focus on the information that can be reliably and consistently extracted from TNUs (nomenclatural acts, synonymies, etc.), which can be done very easily, and from which some pretty powerful reasoning can be achieved (far better than what can be done using name-strings alone).

So my question to you is, do you think TNUs as useless as name-strings in getting to the answers we want (and have a reasonable hope of actually achieving)? Or do you mean something else.

And now for the rant portion of this post: Part of the reason behind my initial response to @rdmpage (which was very helpful in cracking a shell in this conversation to allow good new characterizations to flow) is that I have grown tired of trying to communicate the informatics side of nomenclature and taxonomy using words like "taxa", "taxon concept" and "names". I remember a conversation at TDWG Christchurch in 2004 in which it was clear that these words had already past their expiration date in terms of usefulness. The problem, of course, is that there are about a dozen different meanings for each of these words, and whenever people casually mention them in a discussion like this, it's usually very difficult to parse out the precise meanings. Hence, communication bogs down.

I think we should leave terms like "taxa", "taxon concepts", and "names" out of these conversations entirely, and instead focus on a vocabulary that is free from that kind of baggage.

ghwhitbread commented 6 years ago

@deepreef (with apologies to aggregators): Yes, talking TNUs. But I really mean “outside”, across the way, where assemblages of occurrences, traits and images are labeled with TNUs, using “scientificName” strings mapped, in all manner of imaginative ways, onto our carefully crafted graphs, then re-bundled independently into static, aggregated, backbone trees and scattered in their billions across the Internet.

@rdmpage earlier mentions bikeshedding. I also see these “versioning” efforts, very like your wonderful islands, as a chain of ideas passing slowly over a funding hotspot and leaving beautiful mountains of static data that over time become objects of research and wonder in their own right.

If we had just kept quiet, and encouraged, or insisted on, linked data and development of dynamic mapping’s onto taxonomic-names, leaving “verbatim” content intact, we might now be in that different place where science, emerging taxonomies, even collections, can participate in how aggregators, and their citizen-scientists s.l., see the Planet.

nielsklazenga commented 6 years ago

Do I sense some frustration coming through in the last few comments? Let me give my perspective as someone who hasn't been involved in the discussions for so long.

As @deepreef mentioned earlier, core concepts like Taxon and Taxon Concept have never been properly defined and are therefore poorly understood, if not among the people in this discussion then in the wider community, and I think it is important that they are. So, if I were to throw tech jargon around, it would be 'technical debt' rather than 'bikeshedding'. Also, I don't think the problem (if there is a problem) is people attaching too much importance to Taxa and Taxon Concepts, but rather people attaching too much importance to Names.

We cannot not talk about taxa and taxon concepts as those are the very things out there that we are expected to model. One of the critiques at the 'Names for Biodiversity' workshop was that Names, by themselves, do not represent anything that exists in nature (Nico and Jeff expressed it much better than I can now remember).

I really don't understand why this seems so hard and intractable. Let me try to define the concepts (once more) going from the things we want to model rather than from how we model them, but with all biological meaning stripped off:

Taxon: A delimited set of Identifications (that biologists attach a whole heap of properties to that we don't have to concern ourselves with)
Taxon Concept: An assertion or opinion about the delimitation of a Taxon; the OpenBiodiv Ontology has Taxon Concept as an owl:equivalentClass to dwc:Taxon; I am happy with that.
Taxon Name: A label for a Taxon; a Taxon Name, through its type, approximately locates the Taxon, but, because a Taxon Name is applied to any set that contains its type (unless the set also contains the type of another, prior name) does not give any information about the delimitation of a Taxon and therefore might not provide sufficient information to assign an Identification to a Taxon in the case of conflicting or changed taxonomic opinion.
Taxonomic Name Usage: Use of a Taxon Name in a Reference; while perhaps not every Taxonomic Name Usage has a Taxon Concept in it, I think we can consider Taxonomic Name Usage and Taxon Concept equivalent; (probably poorly) paraphrasing @nfranz in an earlier email in the thread that @baskaufs summarised: the extent to which a TNU is useful as a Taxon Concept depends on it having enough context to enable us to assert set relationships between this and other Taxon Concepts; this information doesn't always jump from the page, but experts in the group will often be able to interpret it.

I think (and I think most of us do) the part of the TDWG Ontology that was based on TCS is really good, so I think we should use that as a starting point and see what needs to be added or culled. One useful addition, I think, would be the Taxon Concept Label that @nfranz proposed right at the beginning of this discussion (and in the earlier thread I think) and for which there is precedence in the literature that proceeds TCS; if only because it enables us to keep identifiers mostly under the hood and we don't have to argue about when identifiers should change and when not. It's also something we could encourage people to use on their determinations (and anywhere else where context is missing).

I think names are Names, taxa are Taxa and nomenclatural acts are Events and that they are all different things, not special cases of each other.

mdoering commented 6 years ago

I couldn't agree more, Niels!

I think TCS was thoroughly done. If we release it from its XML chain and potentially update some details we could have something very usable. Lets stick with Name, Taxon and their relations. It is what people are familiar with and if the terminology is confusing then only because we lack precise definitions.

rdmpage commented 6 years ago

Thinking aloud (not always a good thing), does this approach make sense?

Taxa are nodes in a tree (or a network if things get complicated). Nodes in the tree have identifiers and also human-readable labels ("names"). Given a tree what those taxa "mean" is completely defined by that tree. In cases such as GBIF and NCBI the things of interest (occurrences, sequences) are linked to those taxa, so there is no ambiguity as to what a taxon "is" (within the bounds of each classification).

Labels ("names") appear in various contexts, such as publications. We can have lists of names and their locations: ( "string of text", "location in document") - document can be broadly defined to be a paper, a web site, a database entry, and specimen label, etc.

If we had one, unchanging classification (tree) then names would be sufficient to identify taxa, and life would be simple. But we can have multiple trees at any one time (GBIF and NCBI have different classifications) and trees can also change over time (as @nfranz has documented in detail). Hence referring to nodes in a tree by a label may not be sufficient to uniquely identify which node the name refers to. This is not the crux of the "taxonomic concept" issue? When an author uses a string of text as a proxy for a node in a tree, there is ambiguity. So, what we'd really like is: ("node identifier in a given classification", "location in document"). What we almost always have is ( "string of text", "location in document").

So we need to model nodes in a tree "taxa" (candidates include Darwin Core Taxon, TDWG TaxonConcept, BioSchemas taxon, etc.).

We need to model "names", TDWG TaxonName seems an obvious choice as it is already in use by several nomenclators (IF, IPNI, ION).

We also could model "usages", which earlier I suggested are essentially annotations and could be modelled as such. It seems to me that the key point is whether that annotation links to a "name" (could be as simple as highlighting some text) or if it links to a node in a classification. If the later then there's little scope for ambiguity as to what it "means", if the former then there's scope for ambiguity (and the vast majority of usages will be of this sort). So the whole discussion about taxonomic concepts boils down to whether the pointer (the link) is to a string or to a node.

In summary, is "all" we need to do the following?

Agree on a vocabulary to describe a node in a tree ("taxon")
Agree on a vocabulary for a scientific name ("name")
Agree on a vocabulary for describing (name, document location) and (node, document location) pairs

It seems that this captures what we kind of do already: we build classifications, either explicitly or implicitly, we mint new names, and we index names in the literature. It allows for simplicity (e.g., we could have a vocabulary that handles simple name finding in text) as well as more sophisticated approaches that specify which node in what classification (which I'm assuming is what @nfranz is after).

mdoering commented 6 years ago

When a taxon is completely defined by its tree I assume the tree also includes synonyms?

On 17. Sep 2018, at 12:50, Roderic Page notifications@github.com<mailto:notifications@github.com> wrote:

Thinking aloud (not always a good thing), does this approach make sense?

Taxa are nodes in a tree (or a network if things get complicated). Nodes in the tree have identifiers and also human-readable labels ("names"). Given a tree what those taxa "mean" is completely defined by that tree. In cases such as GBIF and NCBI the things of interest (occurrences, sequences) are linked to those taxa, so there is no ambiguity as to what a taxon "is" (within the bounds of each classification).

Labels ("names") appear in various contexts, such as publications. We can have lists of names and their locations: ( "string of text", "location in document") - document can be broadly defined to be a paper, a web site, a database entry, and specimen label, etc.

If we had one, unchanging classification (tree) then names would be sufficient to identify taxa, and life would be simple. But we can have multiple trees at any one time (GBIF and NCBI have different classifications) and trees can also change over time (as @nfranzhttps://github.com/nfranz has documented in detail). Hence referring to nodes in a tree by a label may not be sufficient to uniquely identify which node the name refers to. This is not the crux of the "taxonomic concept" issue? When an author uses a string of text as a proxy for a node in a tree, there is ambiguity. So, what we'd really like is: ("node identifier in a given classification", "location in document"). What we almost always have is ( "string of text", "location in document").

So we need to model nodes in a tree "taxa" (candidates include Darwin Core Taxonhttp://rs.tdwg.org/dwc/terms/#Taxon, TDWG TaxonConcepthttps://github.com/tdwg/ontology/blob/master/ontology/voc/TaxonConcept.rdf, BioSchemas taxonhttp://bioschemas.org/devSpecs/Taxon/specification/, etc.).

We need to model "names", TDWG TaxonNamehttps://github.com/tdwg/ontology/blob/master/ontology/voc/TaxonName.rdf seems an obvious choice as it is already in use by several nomenclators (IF, IPNI, ION).

We also could model "usages", which earlier I suggested are essentially annotations and could be modelled as such. It seems to me that the key point is whether that annotation links to a "name" (could be as simple as highlighting some text) or if it links to a node in a classification. If the later then there's little scope for ambiguity as to what it "means", if the former then there's scope for ambiguity (and the vast majority of usages will be of this sort). So the whole discussion about taxonomic concepts boils down to whether the pointer (the link) is to a string or to a node.

In summary, is "all" we need to do the following?

Agree on a vocabulary to describe a node in a tree ("taxon")
Agree on a vocabulary for a scientific name ("name")
Agree on a vocabulary for describing (name, document location) and (node, document location) pairs

It seems that this captures what we kind of do already: we build classifications, either explicitly or implicitly, we mint new names, and we index names in the literature. It allows for simplicity (e.g., we could have a vocabulary that handles simple name finding in text) as well as more sophisticated approaches that specify which node in what classification (which I'm assuming is what @nfranzhttps://github.com/nfranz is after).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/tdwg/tnc/issues/1#issuecomment-421965864, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AAT_UfEZI6Xkevu58W-z1n7tGsUHeZouks5ub35ngaJpZM4WNKqM.

rdmpage commented 6 years ago

When a taxon is completely defined by its tree I assume the tree also includes synonyms?

@mdoering My gut instinct is to say "no", although it depends what you mean by "synonym" .

If a synonym is simply another name that had been applied to that node, then that means we just have multiple possible labels, and it makes sense to include those for discoverability if nothing else (NCBI does this). But of course the relationship between those things may be complex: the synonyms of Ochlerotatus in GBIF https://www.gbif.org/species/1650073 include a bunch of names that do not include all the taxa in Ochlerotatus , many have been various treated as full genera, subgenera, etc. So having all these synonyms listed in GBIF may help discoverability, but gives users little clue as to the nature of the relationship between those names.

If a synonym is a taxon, then I guess I'd argue that what we really have is, say, two classifications, and a mapping between them (or a list of edits to transform one tree to the other) and the set of things that don't match (or are involved in edits to the tree) are the "synonyms". If we compute the relationship between them (e.g., mappings or edits) then we have an explicit way of describing those relationships, and the TDWG LSID vocabulary has term for those relationships (see also Nico's stuff). So synonyms in this sense is really a relationship between trees, not a property of trees themselves.

mdoering commented 6 years ago

I am asking because to me a (classification) tree only has taxa as nodes, not synonyms.

But if the position of the node in the tree alone defines the taxon, you miss its synonymy which is the strongest definition for a taxonomic concept. I would even argue that the upper tree of a taxon, its classification, does not alter the concept at all. We had some discussion in the CoL about that which might be worth reading: https://github.com/Sp2000/colplus/issues/6

On 17. Sep 2018, at 14:19, Roderic Page notifications@github.com<mailto:notifications@github.com> wrote:

When a taxon is completely defined by its tree I assume the tree also includes synonyms?

@mdoeringhttps://github.com/mdoering My gut instinct is to say "no", although it depends what you mean by "synonym" .

If a synonym is simply another name that had been applied to that node, then that means we just have multiple possible labels, and it makes sense to include those for discoverability if nothing else (NCBI does this). But of course the relationship between those things may be complex: the synonyms of Ochlerotatus in GBIF https://www.gbif.org/species/1650073 include a bunch of names that do not include all the taxa in Ochlerotatus , many have been various treated as full genera, subgenera, etc. So having all these synonyms listed in GBIF may help discoverability, but gives users little clue as to the nature of the relationship between those names.

If a synonym is a taxon, then I guess I'd argue that what we really have is, say, two classifications, and a mapping between them (or a list of edits to transform one tree to the other) and the set of things that don't match (or are involved in edits to the tree) are the "synonyms". If we compute the relationship between them (e.g., mappings or edits) then we have an explicit way of describing those relationships, and the TDWG LSID vocabulary has term for those relationships (see also Nico's stuff). So synonyms in this sense is really a relationship between trees, not a property of trees themselves.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/tdwg/tnc/issues/1#issuecomment-421991832, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AAT_UZUgR8gdvVcNWLEOKzXIev0QXHPDks5ub5NRgaJpZM4WNKqM.

rdmpage commented 6 years ago

What if for each node in a tree you have the “edit history” of how that taxon has moved (or not) over time in different trees. That list (basically what you get in a taxonomic revision) would that provide the information you’re after?

Get Outlook for iOShttps://aka.ms/o0ukef

From: Markus Döring notifications@github.com Sent: Monday, September 17, 2018 13:44 To: tdwg/tnc Cc: Roderic Page; Mention Subject: Re: [tdwg/tnc] Taxon, Taxon Concept and Taxon Name Usage: definitions and relationships (#1)

I am asking because to me a (classification) tree only has taxa as nodes, not synonyms.

But if the position of the node in the tree alone defines the taxon, you miss its synonymy which is the strongest definition for a taxonomic concept. I would even argue that the upper tree of a taxon, its classification, does not alter the concept at all. We had some discussion in the CoL about that which might be worth reading: https://github.com/Sp2000/colplus/issues/6

On 17. Sep 2018, at 14:19, Roderic Page notifications@github.com<mailto:notifications@github.com> wrote:

When a taxon is completely defined by its tree I assume the tree also includes synonyms?

@mdoeringhttps://github.com/mdoering My gut instinct is to say "no", although it depends what you mean by "synonym" .

If a synonym is simply another name that had been applied to that node, then that means we just have multiple possible labels, and it makes sense to include those for discoverability if nothing else (NCBI does this). But of course the relationship between those things may be complex: the synonyms of Ochlerotatus in GBIF https://www.gbif.org/species/1650073 include a bunch of names that do not include all the taxa in Ochlerotatus , many have been various treated as full genera, subgenera, etc. So having all these synonyms listed in GBIF may help discoverability, but gives users little clue as to the nature of the relationship between those names.

If a synonym is a taxon, then I guess I'd argue that what we really have is, say, two classifications, and a mapping between them (or a list of edits to transform one tree to the other) and the set of things that don't match (or are involved in edits to the tree) are the "synonyms". If we compute the relationship between them (e.g., mappings or edits) then we have an explicit way of describing those relationships, and the TDWG LSID vocabulary has term for those relationships (see also Nico's stuff). So synonyms in this sense is really a relationship between trees, not a property of trees themselves.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/tdwg/tnc/issues/1#issuecomment-421991832, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AAT_UZUgR8gdvVcNWLEOKzXIev0QXHPDks5ub5NRgaJpZM4WNKqM.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/tdwg/tnc/issues/1#issuecomment-421999939, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AAFFattgj8cpFpqRv1Z7gOiNFAx1SaH_ks5ub5kngaJpZM4WNKqM.

frmichel commented 6 years ago

Dear all,

Thanks @rdmpage for mentioning my id earlier in this thread, this gives me the opportunity to get to know it.

Just a few remarks about Bioschemas.org and its relationship with the questions discussed above. Bioschemas' goal is to produce a vocabulary that life sciences-related web sites can use to markup their pages. The philosophy is to reuse existing terms from schema.org as much as possible, and reuse terms from well adopted vocabularies when no relevant term exists in schema.org.

The biodiversity work group has just started with the description of term Taxon. This is not a new term to be included in schema.org, but a "profile" that instructs how to use existing properties and types to markup a web page. The point is to keep things simple enough (we do not build a complex ontology), but not too simple. The current specification that I have drafted (see an example) uses the dwc:Taxon term as a main type, along with schema.org properties like name and alternateName, or TaxonConcept ontology's tc:rank.

Now, this work has just started really, and many questions need to be addressed, that are quite close to the concerns discussed in this thread:

Should we use terms from DwC, TDWG ontolgies or any other?
Should a taxon be identified by id, by its accepted scientific name? Should this include its circumscription somehow?
Should we model only taxa with names attached as simple strings for the sake of simplicity? Or should we also model names as first-class citizens?
What about links to third-party taxonomies and projects: should we include properties to name the NCBI id, EOl id, GBIF id etc., like in Wikidata?

I've already contacted some of the people on this list to invite them to participate in the project, and I take the occasion to widen the invitation. If you wish to get involved, bring ideas and discuss use cases, you are all most welcome! Get here to join: http://bioschemas.org/howtojoin/

Franck.

jgerbracht commented 6 years ago

I think Niels and Markus were able to articulate my basic thoughts on this topic very well, as has Richard and others over the weekend. There are differences between Taxon Name, Taxonomic Name Usage and Taxonomic Concept, some subtle, some not so subtle which is why I agree with Niels list of 'objects' we're dealing with. And Markus has it exactly right with his statement

"Most taxonomies I have seen actually have name identifiers, not identifiers for taxa."

and this is the biggest issue I'm hoping that we can address with this group, yet to do that, we need to make sure we mean the same things when we say Taxon Name or Taxon Name Usage or Taxon Concept.

In essence, I think we need to have several distinct objects, at least for our conversations, whether each has it's own set of identifiers or not is yet to be hashed out, but do we agree with these basic objects?

I'm not suggesting we adopt the following, I put them here to show how I think of these

Taxon Name (TN): A label for a Taxon, i.e. scientific name plus year and person who FIRST coined this scientific name.

Taxonomic Name Usage (TNU): Use of a Taxon Name by either the original 'coiner' or SUBSEQUENT authorities, which may or may not include enough information to define a Taxon Concept (think circumscription).

Taxon Concept (TC): An assertion about the delimitation of a Taxon; or as defined in the beginning of the conversation. A circumscribed set of organisms, inclusive of individuals living, recently dead, and yet to be born, asserted to represent a natural cohesive biological unit.

A single Taxon Concept will inevitably have multiple Taxon Names and Taxonomic Name Usages applied to it.

All Taxon Concepts should have a single original or defining TNU but not all TNUs will map to a defined Taxon Concept. A good example of this is Canada Goose. The original TN and TNU of Branta canadensis Linnaeus 1758 is a painting, with no description of the 'circumscribed set of organisms'. We have no idea if Linnaeus meant to include all eleven subspecies, or just one, or somewhere in between simply because he wasn't aware of the entirety of the species complex. And of course we'll never know, we can only guess. In this case, I would argue there was a TNU without an associated TC. The Taxon Concept was circumscribed later. One question I have for the taxonomists in the group, is this accurate, or does a TNU ALWAYS map to a TC?

Practically, I'm sure the current state of Taxonomies to be rather messy in this regard. i.e. there are certainly TNUs without a clearly 'circumscribed set of organisms' simply because our knowledge isn't yet complete enough to clearly circumscribe the TC. In those cases, the current circumscription (which may be a single specimen) will have to be 'good enough' to describe the TC. However, in cases such as Canada Goose, where there aren't any 'undiscovered' populations, we can and should use a later TNU as the defining circumscription to the various Taxon Concepts, i.e. the clearer, the better.

I can certainly see the advantages of treating TNUs and TCs as the same, but for this to help me in my work both as a generator of Taxonomic Concepts and as a consumer, there needs to be a single identifier for a Taxonomic Concept that does not change, i.e. an identifier ALWAYS maps to the same circumscribed set of organisms, regardless of the TN and TNU applied.

Re TCS, there are two items that concern me. In several places the TCS seems to state that it is more about transferring and the mapping of multiple concepts (I recognize this may be a misperception on my part) and less of a Taxon Concept definition schema.
Secondly, to be completely honest, I've not used the TCS, nor thought of it as a viable solution to the Taxonomic Concept issues I see simply because the implementations most often seems to be dealing with name identifiers, not identifiers for taxon concepts. i.e. broad scale conflation of names and concepts.

Great discussions and I do feel like we're making progress, Jeff

jgerbracht commented 6 years ago

Some more Use Cases

As a data consolidator. Do these 2 pieces of information, from different data providers, refer to the same Taxonomic Concept? What is the name for this Taxon Concept given an authority/publish date? I want to 'discover' and access datasets which unambiguously refer to a specific Taxonomic Concept. i.e. H. sapiens which includes neanderthalensis vs excludes neanderthalensis

As a data provider Publish taxon related data, such as observations, which are not easily misunderstood or misapplied. i.e. I want to publish my data so that it's difficult to inappropriately consolidate with other datasets based on differing taxon concepts.

As a taxonomy publisher I need to 'mint' new Taxon Concept identifiers.
I need reliable methods to determine if a specific Taxon Concept already exists, so I don't mint duplicates. I need to publish the 'Clements' checklist with Taxon Concept identifiers and provide a persistent landing page for TC definitions. As Taxon Concepts become obsolete, I need to still provide the basic information of a concept, i.e. TNU and circumscription.

tdwg / tnc

Taxon, Taxon Concept and Taxon Name Usage: definitions and relationships #1

Taxa

Names, usages, etc.

Taxonomic concepts

Summary