tdwg / tcs2

The TCS 2 Task Group will turn TCS into a form in which it can be maintained. The new version of TCS will be a vocabulary standard like Darwin Core and Audiovisual Core and will complement these other existing TDWG standards.
6 stars 0 forks source link

class: TaxonConcept #1

Closed nielsklazenga closed 2 weeks ago

nielsklazenga commented 3 years ago

TaxonConcept (class)

Label Taxon Concept
Definition The underlying meaning, or referential extension, of a scientific name as stated by a particular author in a particular publication. It represents the author’s full-blown view of how the name reaches out to observed or unobserved objects in nature (beyond statements about type specimens). It is a direct reflection of what has been written, illustrated, and deposited by a taxonomist, regardless of his or her theoretical orientation (Franz & Peet 2009).
Comments

Mapping

TCS 1 DataSet/TaxonConcepts/TaxonConcept
TDWG Ontology http://rs.tdwg.org/ontology/voc/TaxonConcept#TaxonConcept
Darwin Core
nielsklazenga commented 3 years ago

I have adopted the text from Franz & Peet (2009) for now, but I am not sure it was intended as a definition, as it is more a description of the relationship between a taxon concept and its name. For me, defining the taxon concept by reference to a taxon name is like putting the world upside-down, so I would prefer something like this:

An hypothesis, assertion or opinion about the delimitation of a taxon.

I had 'a taxonomic group of organisms' first, but I think we should just refer to the definition in Darwin Core to indicate that our usage of the term 'taxon' also includes groups that are not normally considered taxa, such as hybrids and cultivars.

Franz and Peet's (2009) text could still very well serve as a comment.

nfranz commented 3 years ago

It was intended as a definition (just to clarify that). To further clarify, it makes sense to center this definition around the relationship between a label/string and its referential extension as viewed by a certain author/source and at a given time, if the intent is not to use the term "Taxon". Which is not mentioned once in Franz & Peet (2009). I understand the contextual requirement for compatibility with DwC, however. In some sense, given that the 2009 article was not really meant to be all that compatible with DwC, there might be a case here for just omitting it; or say: for an alternative conception, see...

jar398 commented 3 years ago

It is not obvious to me that the author's full blown view of a name is going to be an extension, or a single extension. (Similarly there will be extensions, such as those hypothesized algorithmically or those for "partly blown views", that are not any author's full blown view.) For example, the full blown view might be vague or ambiguous or incomplete or inconsistent, perhaps intentionally so. A view and an extension are very different kinds of things - they have different properties and identity criteria.

I could try to invent a definition that I like better but I'd insist on starting with use cases - those should be able to dictate any missing details of the particular sense that we would like to assign to this class name. Can someone offer high-quality examples where some particular thing (hypothesis, extension, text, etc.) would be a member of class:taxonConcept? Maybe there are some in TCS ...? It is always better to build a class up from examples than to think about it in a vacuum. The latter generally just leads to a lot of unproductive philosophizing and arguing.

I'm not disagreeing with @nfranz , I'm saying that there are easier ways to hone definitions than just talking about them abstractly.

nielsklazenga commented 3 years ago

@nfranz, what is the rationale behind trying to avoid the term "Taxon"? As that is where I would start and that has got nothing to do with Darwin Core.

nfranz commented 3 years ago

Thanks, @jar398. I may, or may not, be able to rescue "full blown" by reading it narrowly to just mean: all that one (someone else) can justifiably (according to mainstream systematic criteria for acceptable practice at the time) infer from that source about the concept's extension. Agree though, ok to move forward from that.

Examples: https://doi.org/10.1080/14772000.2013.806371

nfranz commented 3 years ago

@nielsklazenga It is one term that has two I think very important flavors or functional domains in biology, a more realist one (referring [however imperfectly] to natural, causally sustained phenomena) and a more constructivist one (modeling human data evolution); and, as often defined and applied in DwC, it can support both or kind of either in context, but as just one term it is not well suited to keep the two flavors apart consistently and explicitly, when and where that is needed. (And this is my vote not to continue this subthread further here; I am merely answering a question I was asked.)

nielsklazenga commented 3 years ago

@jar398 , not sure what sort of examples you are after, but the use cases @jgerbracht and @camwebb presented at the IG meeting in September might be a good start. I have plenty of examples too, but they are mosses and I think it is better to illustrate with examples of better-known organisms.

At Biodiversity_Next, Olaf Banki had a nice example of the African Elephant (I think that came originally from David Remsen). @jliljeblad spoke at that symposium as well, so may have some insect examples.

And @nfranz just posted an example.

jar398 commented 3 years ago

I'm looking not just for pointers to documents but designation of particular entities either in or described in documents. The point is to be able to nail down identity criteria, use vs. mention questions, and other ontological fundamentals (so as to enable interoperability). Are we talking (in an example) about the text in an article, or the meaning of the text (that is different), or the extension of the meaning of some text? Or something else? If there are distinct taxon concepts with the same extension, what is an example of that? Having more examples is not necessarily better since different examples may indicate different answers to the general questions; that is why I think a small number of "best" or "canonical" examples (one might say: "type" examples) is better. They should be relevant to some existing project like eBird, not aspirational. OBO tries to include one or more examples with every class definition, and I think that is a good idea, but again, one has to be a bit careful here or else the text describing the example will be too ambiguous. (In particular, taxonomic names are ambiguous in multiple ways.) A use case may have to not just give data, but talk a bit about how it will be used, because without that there are likely to be ambiguities. - I agree that this shouldn't be difficult, I'm just not sure I'm the right person to be putting such things forward.

You say @nfranz posted an example but even in that careful article I can't quite tell whether the given 'taxon concepts' are meant to be extensions, or entities (perhaps bibliographic or conceptual) with associated extensions. Is it possible to have two distinct taxon concepts with the same extension? That's not answered. They have 'membership compositions' - that is useful information - but do they have anything else? Talking about the inferences that happens inside of real applications is going to hone these questions more than looking at prose like Nico's article. So when I say "example" I really mean, ideally, examples of digital data (like a row of a csv file) in its natural habitat (such as Euler/X or iNaturalist). Stuff of the sort that this ontology is targeting. Prose is not so helpful as an example source (unless the entity in question is textual), and if there is no inference (such as deduplication) it is hard to know what is intended. Again, this is not a matter of speculation. We should be able to look at what our applications do and figure it out from there.

jar398 commented 3 years ago

In TCS 1 taxon concepts are definitely textual in nature - at least, I don't see how to read it otherwise, although it doesn't come out and say so. Evidence: a TaxonConcept can have only one taxon name. (by comparison, an extension might have many taxon names, if it has many descriptions/publications.) We could say that TaxonConcept is carried over as compatibly as possible from TCS 1 to TCS 2, but I think examples would still be in order, to clarify questions like this.

nfranz commented 3 years ago

Thanks, @jar398. I get that. I am not sure we have that on tap and ready to be deployed. A while ago I started more of a HowTo guide; but that is semi abandoned. Two sensible outs I see. (1) Decide that that is more implementation than TNC document specification and take a pass, either pragmatically or perhaps even more profoundly (see below). (2) Yes, do that work for an existing, sufficiently structured, relevant source. eBird would be great. Avibase might be easier. Both options (two phrases back) may have downsides.

The way I have preferred to implement this, to address your questions. I have looked for maximizing intensional congruence across concepts in separate treatments, wherever I could imagine sitting in front of an audience of skeptics and hold my ground well enough. My base challenge to myself has been: at what point can I no longer claim with a straight face that intensional congruence can somehow be rescued for any subset of the concepts being aligned? Inversely: I have looked for rather unassailable evidence, textual or contextual or otherwise, of non-congruence; and in absence of that given it the benefit of the doubt. So therefore I have asserted RCC-5 articulations for meanings, more so than texts. Extension has been mostly a synonym of meaning. Because there is now a kind of pay-off by saying: two separately published concepts are congruent ("same extension") - that is the supposedly helpful integration product being offered - yes of course two concepts can be distinct minimally in the sense of: two non-identical name sec. source labels; while having congruent extensions.

In the Perelleschus paper referenced above, 54 concepts are recognized. There are numerous instances of congruent extensions among these, hence there are far fewer instances of reciprocally non-congruent sets or clusters of concepts (if that makes sense). That is further explored here: https://doi.org/10.1371/journal.pone.0118247 (which has input data files for reasoning). Look for "Alignment 1 — Voss (1954) and Günther (1936)" to dig deeper.

Hard though for anyone, I suppose, not just me, to divorce this effort then from some related political aspirations. I'd rather be vague and allow different more specific implementations of a purposefully under-specified TNC document fight for however local and temporary functional adoption, than overly constrain future application through examples that might limit someone's freedoms to actually use RCC-5 productively. The experienced truth, I think, and way out of this maybe false choice, is to ask ourselves: ok, research communities in biodiversity are often quite shy with this RCC-5 business. How, minimally and through well chosen examples, can the TNC document serve to reduce that reluctance?

nielsklazenga commented 3 years ago

Yes, I think it is definitely the intention to carry over the TCS 1 TaxonConcept as compatibly as possible and I agree that we need a lot more than TCS 1 has, including examples of taxon concepts (and also examples of what are not taxon concepts).

The definition in the TCS user guide, by the way, is:

A Taxon Concept is a name plus a description of a taxon.

There is some good stuff in sect. 14.1.

jar398 commented 3 years ago

@nfranz, I think there is enough current practice that (a) we don't have to be political and (b) there is little need for underspecification. I think that if you are dealing with an existing specification or platform that is underspecified, then that underspecification needs to be preserved. DwC seems to be like this. But that is not the case here. TCS 1 is pretty well specified (if I remember correctly - would need to review it), and the TCS 2 features that are not in TCS 1 are new so we can be totally prescriptive (subject to a desire for utility). - to repeat what I said before, underspecification is a recipe for non-interoperability, chaos, and errors. The political gains of underspecification are short term and rely on debts that always have to be paid off later. There are limits to how sharply anything can or should be specified but that's not what I'm talking about here.

Looks to me like 'Taxon concept' sensu TCS 1 should be preserved, possibly under a different name, and possibly a new, separate class 'extension' added - and conceivably a third one, for 'intension', as you might be suggesting - each with different identity criteria. I'm not sure where our discussion most recently ended on that. I think I had suggested using 'extension' informally in the documentation but not turning it into an ontology term but I'm not going to voice much of a position here. - but. again, the deciding factor here should be which things we need for data exchange, and that depends mostly on what kind of reasoning our various platforms and applications are doing (also on what we consider erroneous inputs, misuse, etc of the platforms), and that's an empirical question. If having two entities (rows, etc) with the same extension is the wrong way to use a given application, then those entities are probably intended to represent extensions, not taxon concepts. If it's right then they're taxon concepts or taxon intensions (which can be distinguished by the same method), etc. We can tell the difference by looking at how the application works and what input constraints have to be observed to get good results from it.

You'll probably have a chance to be politically aspirational in the TCS 2 documentation. Or maybe your aspirations are captured well by Euler/X and the ways it's used, and targeting Euler/X as a use case could ensure that they're represented.

I understand you've voiced a nuanced view above and maybe I'm not treating it with sufficient care, let me know if any of this helps

nfranz commented 3 years ago

Thanks, @jar398! I don't have much to add. Yes, for the paradigm case higher-level taxonomic concept alignment, say "the oak genus of the Chinese Flora" vs. "the oak genus of the Mexican Flora", my taxonomic instincts have pointed to: intensionally congruent; extensionally overlapping (some children being widespread). But not all data worthy of alignment offer that duality, that clearly. Sometimes, a single "congruent" is the most sensible, and will do good services that way. That said, I feel your comment is pointing in the right direction.

Archilegt commented 2 years ago

I am wondering why "Taxon Concept" (the label) and not "Taxonomic Concept". The latter makes more sense to me, but I am not a native English speaker. In Spanish it is also better "Concepto Taxonómico" than "Concepto de Taxon".

Archilegt commented 2 years ago

About the "Definition" of Taxon[omic] Concept: "The underlying meaning, or referential extension, of a scientific name as stated by a particular author in a particular publication." That part is overly complex. Simpler/simplest: "A description or a definition of a taxon denoted by a scientific name, as stated by a particular author in a particular publication." Additionally: Within the "knowledge of the object" sensu Linnaeus, the narrower taxonomic concepts are formed by characters and characters states, but exclude attributes and relacters (sensu Dubois, 2017).

nielsklazenga commented 2 years ago

I am wondering why "Taxon Concept" (the label) and not "Taxonomic Concept". The latter makes more sense to me, but I am not a native English speaker. In Spanish it is also better "Concepto Taxonómico" than "Concepto de Taxon".

The terms are used interchangeably, but it was TaxonConcept in TCS 1 (so that is the short answer). My two-cents' worth is that TaxonConcept is correct and that 'taxon concept' and 'taxonomic concept' are two different things. It is not just a matter of using a noun or an adjective. The noun for which 'taxonomic' is an adjective for is 'taxonomy', not 'taxon'. 'Taxonomy' is much broader than 'taxon'. 'species', 'genus' and 'scientific name' are all taxonomic concepts and that is just a few examples from biology and, unlike 'taxon', 'taxonomy' is also widely used outside biology.

nielsklazenga commented 2 years ago

About the "Definition" of Taxon[omic] Concept: "The underlying meaning, or referential extension, of a scientific name as stated by a particular author in a particular publication." That part is overly complex.

We could lose the ',or referential extension` bit, if that helps.

Simpler/simplest: "A description or a definition of a taxon denoted by a scientific name, as stated by a particular author in a particular publication."

That is sort of what it was in TCS 1. The problem is that this definition excludes a lot of things that we consider taxon concepts, such as checklist entries etc. A Taxon Concept needs neither a description nor a scientific name (so for me the problem with the (current) definition is the word 'scientific'). They need a label and sufficient context to be able to compare them with other Taxon Concepts (a lot of things that we have to deal with as Taxon Concepts do not even have that).

I would have gone from the Taxon rather than the Taxon Name, so something like:

An opinion about the delimitation of a Taxon (sensu Darwin Core) or taxonomic group as as stated by a particular author in a particular publication.

...but most people do not like that and also that still not covers all the things that we want to treat as Taxon Concepts.

jgerbracht commented 2 years ago

How about 'The delimitation of a taxon as stated by a particular author in a particular publication' ?

-- Jeff Gerbracht Lead Application Developer Birds of the World Cornell Lab of Ornithology 607-254-2117


From: Niels Klazenga @.> Sent: Monday, January 31, 2022 10:19 PM To: tdwg/tcs2 @.> Cc: Jeff A. Gerbracht @.>; Mention @.> Subject: Re: [tdwg/tcs2] class:TaxonConcept (#1)

About the "Definition" of Taxon[omic] Concept: "The underlying meaning, or referential extension, of a scientific name as stated by a particular author in a particular publication." That part is overly complex.

We could lose the ',or referential extension` bit, if that helps.

Simpler/simplest: "A description or a definition of a taxon denoted by a scientific name, as stated by a particular author in a particular publication."

That is sort of what it was in TCS 1. The problem is that this definition excludes a lot of things that we consider taxon concepts, such as checklist entries etc. A Taxon Concept needs neither a description nor a scientific name (so for me the problem with the (current) definition is the word 'scientific'). They need a label and sufficient context to be able to compare them with other Taxon Concepts (a lot of things that we have to deal with as Taxon Concepts do not even have that).

I would have gone from the Taxon rather than the Taxon Name, so something like:

An opinion about the delimitation of a Taxon (sensu Darwin Core) or taxonomic group as as stated by a particular author in a particular publication.

...but most people do not like that and also that still not covers all the things that we want to treat as Taxon Concepts.

— Reply to this email directly, view it on GitHubhttps://github.com/tdwg/tcs2/issues/1#issuecomment-1026441217, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAO4SSOV4DU45YQ5SQPPL43UY5GMDANCNFSM445LIIJA. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub. You are receiving this because you were mentioned.Message ID: @.***>

nielsklazenga commented 2 years ago

How about 'The delimitation of a taxon as stated by a particular author in a particular publication' ?

I would be very happy with that. Maybe make it '...stated or implied by...'? I am sure we can pick holes in this definition, as we can in all others, but I think perfection is unattainable here.

camwebb commented 2 years ago

My contribution during the meeting, take it or leave it 🙂: "The delimitation (or boundaries) of a taxon, usually by humans, often through the work of taxonomic circumscription, usually communicated in a publication."

I'd revise it to: "The delimitation of a taxon, often established through the work of taxonomic circumscription, usually communicated in a publication." or even... "The delimitation of a taxon, usually communicated in a publication."

nielsklazenga commented 2 years ago

Sorry to keep bringing this up, after we spent almost all of last meeting on it, but I think this is the whole ball of wax and we will not get anywhere unless we can settle this properly, as we (or other people if we do not) will keep revisiting it if we do not. I do not really think that anybody in the meeting thought this would be the end of the discussion, but, after having thought about it for a while (almost three weeks now), I think we cannot just put it to bed and move on.

Needless to say, after this, that I am not happy with the definition as it stands now (https://github.com/tdwg/tcs2/issues/1#issuecomment-1027338309). It entirely lacks the "concept" and is less a definition of a term than a description of the data we want to put in there. I think that, if we define terms based on what the data looks like, we will always be going in circles.

I have been thinking about language a lot in the last two weeks in order to understand my own thinking and to try to understand why I have so much trouble explaining things that I think I have so clearly in my mind to other people (or why other people do not get it really). I think that people's brains are wired (slightly) differently because of the language they grew up with. Also, if English is not your first language, if you want to really understand something, you always fall back on your first language. I have been living in Australia for 22 years and, since I graduated from university, have never written anything significant in any language other than English, and I still do that.

So, I hope the following is helpful.

We have the word 'concept' in Dutch, but we are much more likely to use one of its synonyms (https://www.interglot.com/dictionary/en/nl/translate/concept), 'begrip', which literally translates in English to 'understanding', or 'opvatting', which translates (also literally) to 'opinion'. So, in my mind, all the definition of 'Taxon Concept' needs to (and should) be is:

Understanding of a taxon

This is what I have always understood taxon concepts to be and what my colleagues, who know nothing about biodiversity informatics and have never heard of TCS, understand them to be. Taxon concepts were not invented by TCS or Franz and Peet (2009), they have always been there; maybe not exactly as the combination of words 'taxon concept' (that would be one word in Dutch if we had one), but we certainly always have been talking about someone's concept of a taxon.

Note that the definition above is the same in meaning, if not verbatim, as the definition from Franz & Peet (2009) that we started with. Franz and Peet's definition was for a different audience and they tried to avoid the term 'taxon', because of difficulties with the term for that audience. For our audience, the term is unproblematic, as there is a perfectly adequate definition of Taxon in Darwin Core. On the other hand, I think it is important to avoid 'scientific name' to make clear to an audience of largely non-systematists that names are not the things we are interested in, but are the labels of the things we are interested in.

I also removed the 'as stated by a particular author in a particular publication', as I do not think being published (or having an 'according to') makes something a taxon concept. A notion (which funnily enough also translates to 'begrip' in Dutch) about a taxon in someone's head is just as much a taxonomic concept as a published opinion. Of course we cannot do anything in TCS with taxon concepts that are not communicated in some way, shape or form, and in TCS taxon concepts need to have an 'according to' (and a label), but that has got nothing to do with definition.

I think we have focused way too much on names and publications and possibly have lost track a bit of what we really want to describe, convey, or exchange. That is what happens when you look too closely at the data – or get at it from the data. You stop seeing the forest through the trees (or the domain through the data).

I think less is more here and that removing every reference to names and publications actually makes the definition clearer and makes it easier for people to understand what things are taxon concepts and what things are not. I think it is clear, for example, that it is clear that there is no difference between the taxon concepts in individual publications ("taxon name usages") and the so-called "deep" taxon concepts in e.g. AviBase (this is absolutely not to take away from AviBase, which I think is great). It is also clear that synonyms, no matter how broadly you take the term, are not taxon concepts. At the data level, I myself, like most of us, have always treated synonyms as taxon concepts (or the same as taxon concepts), and not just because this is the only way you can deal with synonyms in TCS 1 and Darwin Core, but I have never thought they are taxon concepts (they are names) and I do not think this should be accommodated by the standard (and certainly not by the definition). That would just stop people from looking for better ways...and there is a better way.

@deepreef is not the only one who can write long comments.

nfranz commented 2 years ago

A way to provide a functional definition may be this? An identifiable taxonomic position that can be aligned to other such positions via [TCS-compatible] relationships.

I like this because it shifts the work of productive definitional precision (and productive ambiguity) to those agents that are providing the relationships. And it's the production of relationships that we really should try to incentivize (I assume that is a shared view).

If and when these agents (humans, human-specified algorithms) feel justified in producing alignments, well I suppose then we others are justified in harvesting the information integration benefits, thereby granting in turn that was has been aligned somehow met the functional thresholds of being taxonomic concepts.

deepreef commented 2 years ago

I think @nfranz is on the right track here. I'm not sure the word "position" is right (maybe "assertion"? But that's not much better, and may be worse), so there needs to be some wordsmithing, but my gut tells me this is the right direction to go.

Now, for some elaboration:

This general issue has been extensively discussed/debated for decades, and remains unresolved. Ironically, it parallels the "species concept" debate (i.e., no end in sight), even though "concept" is used in a different sense.

I would STRONGLY prefer to avoid the word "concept" -- in part because of the "species concept" confusion, but mostly because of the excessive amount of "baggage" that word carries. By "baggage" I mean that almost everyone in our space (Biodiversity/Taxonomy/Informatics/etc.) has a clear (in their own mind) understanding of the meaning of that word, get there are dozens (hundreds?) of subtly (and not-so-subtly) different interpretations of its meanings. The problem is that when people see that word, they immediately interpret it in their own sense by default, even if provided with a specific definition. Keeping the word "concept" as part of the term will perpetuate that barrier to effective confusion indefinitely.

While certainly not perfect, I think the word "circumscription" suffers far less baggage and associated heterogeneity in meaning within our assorted relevant communities. It immediately invokes the notion of a set of things, and filters out any meaning associated with classification/hierarchy (which some definitions of "concept" include).

Aside from the term, we also wrestle with the "thing" that forms the basis of one of these instances (concept/circumscription). I think we all agree that the "thing" is not a scientific name. The "thing" involves actual physical biological organisms, and the scientific name is just a crude and inconsistently applied text-string label that has historically been used to (roughly) represent the "thing". So I hope we can all agree that the name is not the "thing".

But we still have several candidates for the "thing". I think the two most commonly discussed options are: 1) The thing is the circumscription of organisms implied or defined within a TNU. 2) The thing is a well-defined abstract object that represents a stacked set of circumscriptions of organisms implied or defined within multiple TNUs that are deemed to represent congruent circumscriptions.

Option 1 implies that identifiers are minted for TNUs, and we have secondary data structures that track sets of TNU-circumscriptions asserted to be congruent or asserted to have other RCC-5 relationships with other TNUs.

The advantages of this approach are: 1) The definition of concept/circumscription and the definition of TNU are the same, and we don't need to define another class or mint identifiers. 2) We can more-or less define a TNU objectively, and the ratio of substance to fuzziness is pretty good. 3) TNUs are the foundation of all nomenclatural and taxonomic assertions, as well as the anchor-point for all biological information, so they play a central role in all of biodiversity informatics (i.e., we're going to need to robustly deal with them any way, so might as well make them the core object of "taxon concepts" as well).

The disadvantages of this approach are: 1) We don't (yet) have records for the vast majority of TNUs in existence. Certainly not all are necessary, but even the key ones (Protonyms, major revisions, etc.) still do not exist in structured form with persistent identifiers. 2) We don't (yet) have a robust set of RCC-5 relationships among the TNUs that we do have, and we don't (really) have a standard way of minting them such that they can be easily shared. 3) Even if we could solve these two things, the network of RCC-5 relationships necessary to do any sort of reasoning or derive any useful utility about taxon concept mapping is almost intractably large, and would probably necessitate huge amounts of computing power to run even simple queries.

Option 2 implies that we have some mechanism for recognizing/defining a particular abstract circumscription of organisms, and we assign a single identifier to each unique circumscription. One or more TNUs would be linked to each identified/defined circumscription, but would reman as separate "things" (perhaps they could be framed as "instances" of a particular identified/defined abstract circumscription).

The advantages of this approach are: 1) RCC-5 relationships are applied directly to these abstract defined/identified circumscriptions, so we avoid the problem of an intractable number of RCC-5 relationships among individual TNUs 2) This approach is probably more intuitive for most biologists and most informaticians 3) There would be no need to define "congruent" relationships among these things, because by definition two circumscriptions are the same if they are congruent, so there is only one identifier needed.

The disadvantages of this approach are: 1) We would need a pretty solid definition for these things, such that using that definition it would be unambiguous whether two defined circumscriptions are the same, or different. No one has (yet) proposed such a definition that is practical. 2) Just because there would logically be no need to define "congruent" relationships between two of these things, doesn't mean they won't get minted by accident. Thus, there needs to be a mechanism for establishing two different identifiers as being duplicates, whenever two minted instances of this thing are deemed to represent congruent circumscriptions. 3) There would almost certainly need to be a central authority to mint/define these things.

For most of the past couple of decades, I've been a firm supporter of Option 1, on the grounds that it's relatively easy to define a TNU in a way that most people would implement them in the same way, but it's almost impossible to define a "taxon circumscription" independently of any particular TNU in a way that would be used consistently and semi-objectively.

However, over the past year or so, Dave Remsen, Nicolas Bailly and I have been meeting every Thursday to brainstorm this stuff, and we think we're on to an approach for Option 2 that could work pretty well. I originally suggested it at a workshop hosted by Bob Peet to establish FGDC metadata standards for taxonomy back in the late 1990s (I don't remember exactly when, but Stan Blum, Walter Berendsohn and others in this space were there). Basically, I pointed out that taxon concepts/circumscriptions could be defined at different levels of granularity: taxonomic, geographic, population, and individual organism. The last (individual organism) is the most granular, but also the most useless (in that the vast, vast, vast majority of organisms on Earth are never seen or documented or recorded by humans, nor ever will be). Defining concept circumscriptions based on geography or specific populations is fraught with peril at many levels.

That leaves defining taxon concepts based on taxonomy -- which is the least granular, but definitely the most practical. Using the word "taxonomy" in this sense is misleading, because specifically what this approach does is define taxonomic circumscriptions by included vs. excluded name-bearing type specimens. What has changed in recent months through discussions with Dave and Nicolas is the realization that we can devise specific mechanisms for tracking these kinds of circumscriptions based on, for lack of a better term, "Protonym Count".

This post is already WAY too long, and the amount of text and diagrams necessary to adequately communicate our ideas about this would be enormous. But we're chipping away on documentation to explain and illustrate these ideas, and we'll certainly share them with this group as soon as they're ready. But the point is, I see enough promise in this approach that I've shifted my decades-long stance supporting "TNUs as proxies for taxonomic circumscriptions" (Option 1 above) to "sets of implied Protonyms as explicit definitions of taxonomic circumscriptions" (Option 2 above).

I strongly doubt that this post has added any clarity to the discussion, but at the very least I can reclaim my throne as provider of overly long comments...

dremsen commented 2 years ago

Many years ago Rich introduced me to the basic ideas that led to his post above and me to formulate a workable model for creating a taxonomic name server that would meet some core requirements of mine. Coming from a library that serves biologists, this meant accommodating different taxonomic views and enabling their use to the user community. It did not mean having an opinion on which view was preferred. The service, and the underlying model worked and laid the foundation for further work. Rich and I and Nicholas have combined ideas to try to articulate this into a better system that is more defensible and properly seated in the standards and Codes.

I have struggled to articulate a path for implementing some form of this within the Catalogue of Life but I know it must be done. Catalogue of Life must be able to mint identifiers for taxa but proper taxon identifiers only work if they change if and when the taxon itself changes. If a taxon changes and an identifier doesn't change it's not a good taxon identifier. If the identifier changes and the taxon doesn't change then it's not a good taxon identifier either. So when does the taxon change? Our approach is not the only one but I find it operationally useful for improving precision and recall in biodiversity information - my primary need. It's a compromise in the continuum that starts with a name, moves through an annotated taxon with some undifferentiated synonymy to one that differentiates its facts from its opinion to highly detailed models that get away too confusing for me. So our approach sits somewhere in the middle and asserts that

  1. Taxa, that are presumed to have been properly created under the relevant Code, are immutable. That is, they can be refuted biologically but they cannot be un-made.
  2. In this regard they are a bit like the old atomic theory axiom that atoms cannot be destroyed.
  3. The nucleus of my atom is the original taxon. I visualize it is an Ogden and Nash semiotic triangle composed of an imperfectly transposed concept as evoked in the mind of a biologist by an actual specimen (the type whether extant or long gone) and the name it was given.
  4. When a biologist subsequently captures another specimen and identifies it as that taxon, the biologist is asserting that the second specimen is conspecific to the type of the taxon that was originally conceived by the biologist. This new specimen does not change that taxon. It is an instance of the taxon sensu the second biologist.
  5. When a biologist asserts that a type specimen of one taxon is conspecific to the type specimen of another taxon the result is a compound or heterotypic taxon sensu the biologist. The taxon now includes another, previously distinct (properly published) taxon. The taxon has changed. When you refer to the taxon by name, I respond, Do you mean the one that includes Taxon B or the one that does not? I provide you with identifiers for both thereby providing more precise identifiers than the name.
  6. Taxon concepts, in this model, are composed and distinguished by distint sets of previously distint taxa or protonyms, using Rich's shorthand.
  7. The "protonym set" is the distinct set of these taxa. Only distinct sets of taxa merit distinct taxon identifiers. All other TNUs are asserted by authors to be congruent to them.
  8. The "protonym count" noted by Rich is a simpler shorthand for computing differences. The set, of course, is the most precise.

This cannot be novel since this is exactly how most published taxonomic treatments can be restructured once facts and opinions/syntax & semantics/nomenclature & taxonomy have been separated. The upside to it is that I get a tractable number of distinct and computable senses of a taxon that carry their circumscription details with them. It can generate articulations from these that go right into Nico's Euler. It can enable CoL to mint stable taxon identifiers or a broader infrastructure to be truly inclusive of a wider range of taxon views.

nielsklazenga commented 2 years ago

Thanks @nfranz:

A way to provide a functional definition may be this? An identifiable taxonomic position that can be aligned to other such positions via [TCS-compatible] relationships.

That is what taxon concepts are to me too and I think it is really important that we put it in there somewhere, but I do not think it should be the definition. I also think 'underlying meaning of a scientific name' is still a good definition and I think we should keep it, but in the explanation, not the definition.

@deepreef , I have already said why I think it is a bad idea to define terms based on what the data looks like, so I will leave it at that. That is not to say this cannot lead to perfectly serviceable epistemological definitions, but you will not get the more ontological definition that we are (or I am) after and you will not get everybody to agree on them. Also, these definitions will not be much help when something novel or something that looks slightly different, like the AviBase concepts, comes along, as these definitions will always be neither necessary nor sufficient (I probably did not get all that terminology right).

Whether or not you or I like the word 'concept', or whether or not we think it is used correctly, is entirely irrelevant. What is important is that it is what people use for the thing that we are defining. We are not making up our own stuff and we do not do our own philosophising. The "baggage" is exactly why we use it. 'Circumscription' might have less "baggage" than 'concept' (I'd contest that), but it also happens to mean something completely different. What you call a 'circumscription' – at least in "TNUs as proxies for taxonomic [sic] circumscriptions" – I would call a 'taxon', as does Darwin Core, at least going by the definition of Taxon, if not by the things we can (and do) put in there. A 'circumscription', according to me – and TCS – is a set of characters or specimens that can be used to define (or describe) said taxon. Also, in that phrase, "TNUs as proxies for taxonomic circumscriptions", the TNUs, not the circumscriptions, are the taxon concepts.

deepreef commented 2 years ago

Whether or not you or I like the word 'concept', or whether or not we think it is used correctly, is entirely irrelevant. What is important is that it is what people use for the thing that we are defining.

I disagree. Unqualified words like "name" and "concept" serve as fundamental impediments to communication, in my experience. You spend more time explaining to people exactly what you mean by them, than you do making actual progress. We should leave "Concept" in this sense to the BSC, PSC, etc., and focus on "Circumscription" when talking about the set of organisms implied when using a scientific name or other mechanism for referring to a unit of nature that involves many individual organisms.

I also (strongly) disagree with this:

but it [circumscription] also happens to mean something completely different

Here is the definition of dwc:Taxon: "A group of organisms (sensu http://purl.obolibrary.org/obo/OBI_0100026) considered by taxonomists to form a homogeneous unit." [emphasis added]

A "group of organisms" is exactly what a circumscription is, in the sense of taxonomists considering them to form a homogenous unit.

Again, I think that "TaxonCircumscription" is a much better (=less potential for confusion) term to use than "TaxonConcept".

TNUs, not the circumscriptions, are the taxon concepts.

No, TNUs are not taxon concepts. TNUs are usages of taxonomic names within a static documentation resource (e.g., publication). Many/most TNUs imply a circumscribed set of organisms associated with the name (i.e., what you are defining as a taxon concept), but are not themselves taxon concepts (sensu your usage of that term). TNUs are units of documented information, whereas taxon concepts are defined or implied circumscribed sets of biological organisms.

nielsklazenga commented 2 years ago

@deepreef, I am not saying that words are not important. What I am saying is that that it is not pertinent to what we are doing here. We are providing terms that already exist, not dreaming up new ones. Taxon concept (or concept of a taxon) is a term that is used in the domain. If that is not entirely correct, that is frankly no concern of us. It is not up to us to decide that it is not and do something else. You can argue that in a publication, but not in a standard that you want people to use. As a systematist myself, by the way, I stand by 'taxon concept'.

I think it is parsing words and sloppy grammar that are the fundamental impediments to communication. The 'concept' in 'taxon concept' is not any less qualified than 'circumscription' in 'taxon circumscription'. Also, 'taxon concepts' and 'taxonomic concepts' are different things, just like 'sea lions' and 'marine lions' are different things. A taxon concept is a concept of a taxon, not (necessarily) a taxonomic concept (and there are other taxonomic concepts than taxon concepts). As long as we have got the context right, the meaning of the words does not matter so much. We define terms in their entirety, not word by word. Nobody cares that sea lions are not quite lions.

Regarding the word 'circumscription', you ostensibly disagree with me and then repeat what I said as a refutation of what I said. I said your use of 'circumscription' is the same as 'taxon' in Darwin Core. You confirm that. As defined, or at least implemented, in TCS, 'circumscription' is a set of characters or specimens (it is also an attribute – or nested element rather – of TaxonConcept). I myself think 'circumscription', like 'delimitation' (which means the same, I think), is the act of setting limits to a taxon (i.e. what is in and what is out), not the limits of the taxon, or the taxon, itself. However, directly translated from the Latin, I think it is more like description (in the TDWG ontology, SpecimenCircumscription was translated to circumscribedBy and 'CharacterCircumscriptiontodescribedBy`). Most importantly, if Taxon Circumscription is the same thing as Taxon, it cannot also be the same thing (or a better name for) Taxon Concept.

I never said TNUs are Taxon Concepts, just that the ones in that phrase were (if there were Taxon Concepts in that phrase at all). Taxonomic Name Usages are relevant in a different domain (see Senderov et al. 2019). Only the ones that coincide with (or are, if you like) Taxon Concepts are of interest here.

deepreef commented 2 years ago

We are providing terms that already exist, not dreaming up new ones

That is/was not my interpretation of this task -- but maybe that's where the fundamental problem lies in our disconnect here.

I think most of the rest of our points of dispute appear to be miscommunication.

Let me ask this: Do we agree that tcs:TaxonConcept and dwc:Taxon do (or should) have the same definition? If so, then perhaps we should simply deprecate tcs:TaxonConcept and think about ways to improve the existing definition of dwc:Taxon.

In any case, it appears I misunderstood what you were asking, so it would help me better understand if could you summarize in a few sentences what you are asking this group to comment on.

nielsklazenga commented 2 years ago

That is/was not my interpretation of this task -- but maybe that's where the fundamental problem lies in our disconnect here.

Read the charter of the Task Group. Also, this issue is specifically about the Taxon Concept class. I meant it more generally though in the sense that if you want to change a domain, it is better to write a paper about it then try to achieve that through the standards standards process. I do think we can have these discussions in the TCS Maintenance Group, but not right now and not right here. It is unlikely we are ever going to agree though, as I am a total TCS fanboy.

Do we agree that tcs:TaxonConcept and dwc:Taxon do (or should) have the same definition?

No, we do not. dwc:Taxon (by the definition) is the 'taxon' bit in the definition 'understanding of a taxon'. dwc:Taxon also has a dwc:taxonConceptID property (foreign key) that links to a tcs:TaxonConcept (not to mention that it has both taxonID and taxonConceptID), so Darwin Core thinks they are different. Of course, this is not how dwc:Taxon is used and that is completely fine (by me). We will always need the Taxon class for (results of) determinations. In the Darwin Core RDF Guide, all properties organised in the Taxon class that are useable in RDF are treated as convenience terms (I think that is the term that is used) of the dwc:Identification. So you could say that dwc:Taxon class is a convenience class. I just like it that there is a place where 'taxon' is defined.

Darwin Core also has a dwciri:toTaxon property that links a dwc:Identification to a taxon or taxon concept. So, to me, the phrase "TNUs as proxies for taxonomic circumscriptions" just says "Taxon concepts as proxies for taxa". There is absolutely nothing novel about that, just different words.

jliljeblad commented 2 years ago

"TNUs as proxies for taxonomic circumscriptions" just says "Taxon concepts as proxies for taxa".

And some (me?) would say "TNUs as proxies for taxon concepts" or maybe "TNUs as proxies for taxa".

As have been identified before, a fundamental problem lies in using "taxon concept" in these different ways - not with differences in ways to define or circumscribe taxon concepts (though this can also be a problem).

Otherwise I think I agree with Niels' post where he talks Dutch. The words are very similar to Swedish, so maybe that is why I can follow nicely. Begrip = begrepp and opvatting = uppfattning. We don't need taxon names or an accordingTo in order to define what we mean by the term "taxon concept". But we do need to be clear, as shown by the quote. What would the TCS be if you couldn't go here to check up on the definition of "taxon concept"? With a few good examples thrown in, how it differs from dwc:Taxon and mention of the confusion.

deepreef commented 2 years ago

Thanks, @nielsklazenga and @jliljeblad -- this helps me understand a bit better.

Going back to the long post from @nielsklazenga:

A notion ... about a taxon in someone's head is just as much a taxonomic concept as a published opinion.

OK, something we can agree on! To me, a Taxon Concept (any Concept, really -- but Taxon Concepts here) do not exist outside of human minds. The best illustration of this, I think is Figure 1 of Ytow et al. (2001) in describing Nomencurator.

The actual "Taxon"/"Concept"/"Implied Circumscription"/Whatever-you-want-to-call-it is the thought bubble floating above the bald bearded chap's head (left side of Fig 1). TNUs, by contrast, represent the way in which people (taxonomists) attempt to articulate the outlines or boundaries of the imagined taxon. In the figure, TNUs are represented by the box in the top center of the diagram. Neither the imagined taxon, nor the informatic content of the TNU are themselves composed of biological entities (organisms). The lower-center box (Specimens in a Collection) represents the actual biological organisms. The diagram explicitly refers to specimens, but I think is better expressed as all organisms (whether or not they are killed and put in a Museum).

In any case, as illustrated in the diagram, the process of taxonomy involves using TNUs and references to specimens (and other organisms) to document the boundaries of the imagined taxon in the left thought-bubble, so that it can be communicated to the imagined taxon in the right thought-bubble ("reader").

I think we have focused way too much on names and publications and possibly have lost track a bit of what we really want to describe, convey, or exchange. [...] removing every reference to names and publications actually makes the definition clearer and makes it easier for people to understand what things are taxon concepts and what things are not.

I actually agree to an extent. The word "Publications" is far too restrictive. I use the word "Reference", which I define as any form of static documented resource. We both seem to agree that the "Taxon Concept" is in the mind of humans (thought bubble in the figure), but until we have technology that can record human brain waves, then it's a waste of time to come up with an information standard that accommodates undocumented imagined taxon concepts... as you noted:

Of course we cannot do anything in TCS with taxon concepts that are not communicated in some way, shape or form, and in TCS taxon concepts need to have an 'according to' (and a label),

I agree with this (above), but am worried about this:

but that has got nothing to do with definition.

I disagree. If we're going to define a class for data exchange, we should define it in the way that the information is documented, not in the way that the information is imagined inside someone's head. This is why TNUs are so fundamental to what we're trying to achieve with TCS. I don't think the definition of tcs:TaxonConcept should represent the thought bubbles in Ytow's Fig. 1, they should represent something that exists in tangible, documented form. This is why I dislike the term "TaxonConcept" -- there is no point (in my view) for defining a class of object used for information exchange that represents an imaginary thing. The information we are exchanging is about how people have documented their imaginary taxa.

Having said, that, I agree that the definition as written above is not good. I guess if I were to capture it in my thinking, I would go with something like: "A documented set of implicitly circumscribed organisms that are considered to be taxonomically homogeneous."

This comes dangerously close to the definition of dwc:Organism, and I'm not at all happy with that exact wording, but that essentially captures what I think a "TaxonConcept" should be in the sense of TCS.

nielsklazenga commented 2 years ago

Thanks @deepreef, @jliljeblad, @nfranz and @dremsen, this has been really good. I am running out of time today (and I need to think about something else for a bit), so there will be a longer response tomorrow, but I thought I would not stay silent all day, just to show I am not tired of the discussion, as I think it is leading somewhere, so I would like to see if we can take it to the end (or to a conclusion, or to the next level).

nielsklazenga commented 2 years ago

Okay, so I took my time.

I disagree. If we're going to define a class for data exchange, we should define it in the way that the information is documented, not in the way that the information is imagined inside someone's head.

I think we are probably not disagreeing so much as talking about different things. I probably put it a bit too strongly, but I was only talking about the definition of the term (or dare I say 'concept') 'taxon concept'. I think we should keep that separate from the "definition" of a class. The best way to make sure that the information is documented is to make the taxonName and accordingTo properties required. We'll discuss whether we can do that or not when we get to these properties (#2 and #4).

If we would define the things we are talking about by the way they are documented, we would be talking about Treatments, not Taxon Concepts. I think taxon concepts are not just in people's heads, but that there is a Taxon Concept in or behind every Treatment. I think it goes without saying that you cannot deal with Taxon Concepts in TCS unless they are in a Treatment (but I would like to see someone try). But yes, the definition could be somewhat less vague.

And some (me?) would say "TNUs as proxies for taxon concepts" or maybe "TNUs as proxies for taxa".

Okay, I probably interpreted that incorrectly. Let me try that again (this is going to be long):

First of all, I think we should forget about the Darwin Core Taxon class. I see taxa as the actual groups of organisms (as defined in Darwin Core), not the hypotheses about said groups. They are elusive and not useable as data objects. The TCS Taxon Concept can be seen as an operationalization of a taxon (as can Taxon Circumscription).

A Taxon Concept is one hypothesis and is identifiable (thanks @nfranz), because it is linked to a single source (through the accordingTo). Taxon Concepts also resolve the many-to-one relationship between "names" and "taxa". This is the problem that the same name can apply to different definitions and the same definition can go under different names.

A Taxon Circumscription, if you take it to be the boundary of a taxon rather than the act of defining this boundary or the set of characters or specimens used to do that, is, I think, something different. The way I see it, the same Taxon Circumscription can be shared by multiple congruent Taxon Concepts. Therefore, Taxon Circumscriptions are not identifiable, unless you've got things like AviBase IDs, but these are Taxon Concepts. [The authors of AviBase themselves talk about them as "deep concepts" (Lepage et al. 2014).]

flowchart LR
  class01[TaxonName]
  class02[TaxonConcept]
  class03["TaxonCircumscription ≊ Taxon"]

  class01 ---|"one—many"| class02 
  class02 ---|"many—one"| class03
  class01 ---|"many—many"| class03

I think Taxon Circumscription is probably the closest approximation to a "taxon", but I do not think this is what we are necessarily after – from a data management perspective; as biologists we are of course, but I think we should not confound the two. We just need to be able to tell if two sources are saying the same thing (i.e. their taxon concepts are congruent) and, if not, what the nature of the conflict is. I think that is also what AviBase does. The AviBase IDs are not suitable for use in Catalogue of Life, for example, as one cannot tell which ones can be used together (are mutually exclusive) and which ones cannot. Any of the individual global taxonomies (or versions of taxonomies) would better serve the purpose. However, through AviBase, you can relate a chosen taxonomy to all the other taxonomies.

I am liking the functional definition of Taxon Concept that @nfranz provided better and better, so, contrary to what I said earlier, I think we should adopt this definition as our new working definition.

@nfranz's definition (https://github.com/tdwg/tcs2/issues/1#issuecomment-1047198984) was:

An identifiable taxonomic position that can be aligned to other such positions via [TCS-compatible] relationships.

I think identifiability is key, as is the potential for alignment through topological relationships (alignability?). We can try to come up with an alternative word for 'position', but I think this is just one of those words you come up with when you have rejected everything else.

Now, Taxonomic Name Usages. I am pleased to hear @deepreef say that Taxonomic Name Usages are not Taxon Concepts. I am uncomfortable with terms that can mean anything people want them to mean, so I will turn to the only place where I have seen Taxonomic Name Usages defined, the OpenBiodiv Ontology (Senderov et al. 2018).

In the OpenBiodiv Ontology, a Taxonomic Concept is realised by a Treatment (realization). A Treatment contains a Nomenclature Section, which contains a Nomenclature Heading that contains one Taxonomic Name Usage (which is the accepted name) and can contain a Nomenclature Citation List that contains one or more Taxonomic Name Usages (synonyms). Each Taxonomic Name Usage mentions one Scientific Name.

graph TB
  class01[TaxonomicConcept]
  class02[Treatment]
  class03[NomenclatureSection]
  class04[NomenclatureHeading]
  class05[TaxonomicNameUsage]
  class06[NomenclatureCitationList]
  class07[TaxonomicNameUsage]
  class08[TaxonomicNameUsage]
  class09[ScientificName]
  class10[ScientificName]
  class11[ScientificName]

  class01 -- realization --> class02
  class02 -- realizationOf --> class01

  class02 -- contains --> class03

  class03 -- contains --> class04
  class04 -- contains --> class05
  class03 -- contains --> class06
  class06 -- contains --> class07
  class06 -- contains --> class08

  class05 -- mentions --> class09
  class07 -- mentions --> class10
  class08 -- mentions --> class11

So, a Taxonomic Name Usage can either be the same as the TCS Taxon Name – or a wrapper around Taxon Name – or it can be a relationship between the Taxon Name and the Taxonomic Concept (or the Treatment really), in which case the word 'Usage' makes slightly more sense.

I think it follows that there is a one-to-one relationship between a Treatment and a Taxon Concept. There is also a one-to-one relationship between the accepted Taxonomic Name Usage and the Treatment. Note that there are no connections between Taxonomic Name Usages.

flowchart LR
  class01["(accepted)<br/>TaxonomicNameUsage"]
  class02[Treatment]
  class03[TaxonomicConcept]

  class01 -.-|"one&mdash;one"| class02
  class02 ---|"one&mdash;one"| class03

I think we can also say that the Taxonomic Article that contains the Treatment is the accordingTo of a Taxonomic Concept. So, I think we can overlay TCS on this part of the OpenBiodiv Ontology like this:

graph TD
  class01["TaxonomicConcept &equiv; TaxonConcept"]:::tcsClass
  class02[Treatment]
  class09["TaxonomicArticle &equiv; Reference"]:::tcsClass
  class03[NomenclatureSection]
  class04[NomenclatureHeading]
  class05[TaxonomicNameUsage]
  class06[NomenclatureCitationList]
  class08[TaxonomicNameUsage]
  class10["ScientificName &equiv; TaxonName"]:::tcsClass
  class12["ScientificName &equiv; TaxonName"]:::tcsClass

  classDef tcsClass stroke:red,stroke-width:4px

  class01 -..->|taxonName| class10 %% 0
  class01 -- realization --> class02 %% 1
  class02 -- realizationOf --> class01 %% 2
  class01 -.->|accordingTo| class09 %% 3
  class09 -- contains --> class02 %% 4
  class02 -- contains --> class03 %% 5
  class03 -- contains --> class04 %% 6
  class04 -- contains --> class05 %% 7
  class03 -- contains --> class06 %% 8
  class06 -- contains --> class08 %% 9
  class05 -- mentions --> class10 %% 10
  class08 -- mentions --> class12 %% 11
  class01 -..->|synonym| class12 %% 12

  linkStyle 0 stroke:red,stroke-dasharray:2 2,stroke-width:2px
  linkStyle 3 stroke:red,stroke-dasharray:2 2,stroke-width:2px
  linkStyle 12 stroke:red,stroke-dasharray:2 2,stroke-width:2px

For me, everything starts and ends with Taxon Concepts. Taxonomic Name Usages never come into it. I can see a (limited) role for stand-alone Taxonomic Name Usages – as in mentions of names in literature – as the things you get when, for example, you let loose a name crawler on the BHL or some other corpus, where you do not necessarily – or immediately – know the Article in which the name is mentioned and what type of mention you are dealing with. However, when we have the context of Taxonomic Articles and Treatments, we should be talking about Taxon Concepts, not mentions of names. We should not turn everything into TNU soup first and then have standards and business rules to try to get out what should have gone in in the first place. The relation of a name to a taxon is only as a label. It does not go deeper than that.

So, TLDR, my confusion over the phrase "TNUs as proxies for Taxonomic Circumscriptions" is because I see the path from Taxonomic Name Usage to Taxonomic Circumscription as having two steps, one from Taxonomic Name Usage to Taxon Concept and another from Taxon Concept to Taxonomic Circumscription. The first step is one of data cleaning and the second step one of alignment of taxon concepts. Only the second step is covered in TCS (for now).

jgerbracht commented 2 years ago

First off, apologies for jumping in late (I've been in the field). Going back to Richards Option 1 and Option 2 post, his description of Option 2 aligns with the thinking here at Cornell and with Avibase and he has articulated this this far better than I have in the past.

"we have some mechanism for recognizing/defining a particular abstract circumscription of organisms, and we assign a single identifier to each unique circumscription. One or more TNUs would be linked to each identified/defined circumscription, but would reman as separate "things" (perhaps they could be framed as "instances" of a particular identified/defined abstract circumscription)."

This is exactly what an Avibase ID is and while it may be an identifier for an abstract set of related organisms, in practical terms, it is VERY useful in managing biodiversity datasets which often are based on differing taxonomies.

Using RCC-5 relationships on these 'deep' concepts does "avoid the problem of an intractable number of RCC-5 relationships among individual TNUs" and turns managing disparate datasets into a more reasonable task.

"more intuitive for most biologists and most informaticians"

I agree with this as well, it more clearly differentiates between a taxonomic name and the circumscribed set of organisms that name represents when it's used by an authority.

"There would be no need to define "congruent" relationships among these things, because by definition two circumscriptions are the same if they are congruent, so there is only one identifier needed."

Re the stated disadvantages

"We would need a pretty solid definition for these things, such that using that definition it would be unambiguous whether two defined circumscriptions are the same, or different. No one has (yet) proposed such a definition that is practical."

It seems to me that this is truly what we've struggled with over the last 2 years, Denis et al. uses 'deep' Taxon Concept, as a way to distinguish between Taxon Concepts from the TNU perspective and what an Avibase ID represents. I don't think he was/is set on using the term Taxon Concept but simply used the 'deep' modifier in lieu of a better, more precise, name.

"there needs to be a mechanism for establishing two different identifiers as being duplicates, whenever two minted instances of this thing are deemed to represent congruent circumscriptions." I think this would be handled by an implementation, one example is how DOIs manage 'duplicates'

There would almost certainly need to be a central authority to mint/define these things. I agree with this, and this is also an implementation issue. Luckily, we have this in birds though as far as I know, it doesn't exist with any other taxa.

I'm coming around to the idea that we need a standard to support both Option 1 and Option 2 as both have advantages and disadvantages, as well as real world cases where they would be implemented and used. If we had a central authority for minting 'deep' taxon concepts, then I'd be strongly in the Option 2 camp, but that isn't reality, at least not in todays world.

As for whether we call a 'deep' taxon concept, a Taxon Concept, Taxonomic Concept, Taxon Circumscription, Taxon or something else, I would also vote for using something other than the term 'Concept', I have spent more time discussing taxa with everyone in the conversation having fundamentally different ideas of what a 'Taxon Concept' means than I care to admit.

nfranz commented 2 years ago

Thanks, @nielsklazenga & @jgerbracht.

@jgerbracht you relate to the Avibase implementation as "practical" and "useful" and "reasonable" -- "avoid[ing] the problem of an intractable number of RCC-5 relationships".

I suggest to look at this and other working implementations also as an expression of social values.

We have a considerable culture of biodiversity data services publishing taxonomic structures and transferring them under various general or specific constraints and changes from one platform to another. (Think e.g. taxonomy aggregation and versioning.)

What social values are thereby being expressed towards the scientists whose past and present endeavors (and often but by no means always professional careers) are most closely tied to curating the taxonomic semantic content associated with the taxonomic name usages? Are these systematist communities adequately acknowledged in the digital taxonomic perspective aggregation, (re)publication, and versioning schemes? Do they need to be? Why? Saying that these questions matter, and attempting to answer them, are social value expressions.

As a corollary, we have a culture of aggregators at various levels discussing topics related to identifier boundaries and granularity, identifier reuse, identifier-to-identifier relationships (nomenclatural, RDF, RCC-5, all the way to synapomorphies or I suppose molecular tree branch support values). And discussing, often at the same time but not always explicitly, who should get social credit for what work related to coining, reusing, and authoring relationships between identifiers. "We do not really see ourselves as an authority; we are just an aggregator" -- is a social values position. Often it is a defensive position because whether one is viewed as an authority or not is largely not an internal choice in practice. To the extent that users refer to you, you have become the authority.

Avibase might not land with the social value assignment 100% on target -- if indeed the underlying motivation is to assign intellectual credit where it might be most due (all matter of grades because language and meanings are always somehow shared) -- but they are trying. 'Some instances of name-to-meaning associations are somehow privileged. Others are largely an approving copy and paste. Our implementation is justified in modeling that distinction to some feasible degree'. And users appreciate that intent to present a socially sensitized design.

Using concept taxonomy "sensibly" -- i.e. while simultaneously expressing certain social values -- I think always requires a higher sensitivity than we are used to, towards everyone's role as a speaker and listener and passer on.

In my view, if we try to divorce the technical issues from the shared social values, we are not taking the quick route towards a workable solution. When we get bogged down, it often seems to me that we need to remind ourselves just how much of a value system the standard can be expected to provide (as a core requirement), versus to what extent those values must be provided outside of the standard, by "practical" implementations.

deepreef commented 2 years ago

@nielsklazenga :

I think we are probably not disagreeing so much as talking about different things.

Yes! I think you're right. And I suspect that explains the vast majority of (perceived) disagreements in all of our discussion. Part of it is that we simply have different interpreted meanings to the words we're trying to use -- illustrations help a lot!

If we would define the things we are talking about by the way they are documented, we would be talking about Treatments, not Taxon Concepts.

Yes. And for what it's worth, all "Treatments" in the sense of PLAZI (who are all about Treatments) correspond 1:1 with TNUs. So I'm hoping we can adopt that notion of "Treatment" in the context of these discussions as well. Note that not all TNUs are Treatments. Rather, Treatments are the subset of all TNUs that are treated as the "accepted" name for a taxon circumscription. If a synonymy is included within a Treatment, then each heterotypic synonym is a separate TNU within the same Reference as the Treatment, but do not themselves represent additional Treatments within that same Reference. Put another way, all Treatments correspond 1:1 with TNUs, but not all TNUs are Treatments.

And putting it yet another (much more concise) way (thanks, @nielsklazenga):

There is also a one-to-one relationship between the accepted Taxonomic Name Usage and the Treatment.

Anyway, putting aside the actual term, I'm hoping we're all on the same page that the notion of a "Taxon Concept" (=Taxon) is an implied set of organisms (=circumscription). Changing the hierarchical classification (e.g., location within the tree of life, or classification of higher-rank taxa) does not change the "Taxon Concept". For example, if Smith recognizes a taxon concept she labels as "Aus bus", and Jones has a congruent notion of the same implied circumscribed set of organisms, but labels it "Xus bus" (i.e., same species but different genus), we can still think of it as the "same" Taxon Concept. The Taxon Concepts would not be considered different simply because the two authors classified them in a different taxonomic hierarchy (i.e., different genus).

Sorry for the longwinded rambling, but I want to be careful that we are harmonious in our thinking of some of the basic issues in play here.

I think taxon concepts are not just in people's heads, but that there is a Taxon Concept in or behind every Treatment

YES! Strong agreement here! If you think of a Treatment (=TNU of a name treated as the accepted name of a concept), then we can think of the Taxon Concept as being the set of organisms implied by the set of heterotypic synonyms (explicitly the type specimens of those heterotypic synonyms), plus other material examined, plus other organisms that share the same set of diagnostic characters, plus other implied organisms inferred from statements about geographic distribution and other information contained within the Treatment. A Treatment essentially represents an attempt to communicate the scope of the circumscribed set of organisms represented by the abstract Taxon Concept.

Therefore, Taxon Circumscriptions are not identifiable

Yes, they are. And what @dremsen and I and Nicolas Bailly and I have been fleshing out is a process by which circumscriptions are defined by the set of name-bearing types that are included (proxied by the set of heterotypic synonyms). It will take a LOT of text and diagrams and presentations to adequately capture this and explain in, but the point is we think this is the "sweet spot" for a computable definition for a taxon circumscription.

I think Taxon Circumscription is probably the closest approximation to a "taxon"

This is where we depart a little in our thinking. In your first diagram, I see no useful difference between the box labelled as "TaxonConcept" and "TaxonCircumscription" (=Taxon). Sure, we could assign these terms to subtly different things, but I don't see any real value in doing so in the context of a data exchange standard. And even if not part of the exchange standard, I'm not sure introducing these as subtly different things assists us in coming to a mutual understanding of what we're talking about.

I am uncomfortable with terms that can mean anything people want them to mean,

Ha! You and me both!

I will turn to the only place where I have seen Taxonomic Name Usages defined, the OpenBiodiv Ontology (Senderov et al. 2018).

Note: In their section on TNUs, Senderov et al. 2018 cite this publication, which is where I defined the idea using the term "Assertion". We (TDWG name nerds) switched to the TNU term because TCS adopted "RelationshipAssertion" for representing third-party RCC-5 relationships, and it got confusing. Thus, the GlobalNames group decided at one of the NOMINA meetings to adopt the term "Taxon Name Usage" (which we now call Taxonomic Name Usage).

In any case, yes -- the big diagram in your post gets it right, in my view. The alternate way of looking at it is my earlier "Option 1", which is that EVERY TNU can be considered a potentially distinct Taxon Concept. Why? Because we have no way to precisely define the boundaries of circumscriptions implied for a particular Taxon Concept -- so every Treatment potentially represents a different Taxon Concept.

I think it follows that there is a one-to-one relationship between a Treatment and a Taxon Concept.

If you really mean this literally, then this represents my "Option 1". EVERY Treatment (=TNU for an accepted name) is a (potentially) distinct Taxon Concept, so each identifier assigned to a Treatment-TNU can be used as the identifier for the corresponding Taxon Concept. In this paradigm, we need to make explicit statements of congruency between multiple TNUs that all are asserted to represent the "same" circumscription.

In the past I favored this approach as well, but the discussions with @dremsen and Nicolas Bailly have convinced me that there is a practical/useful/computable way to mint identifiers for "Protonym Arrays", which themselves function as my Option-2 "Taxon Concepts", with their own identifiers. This gives us the opportunity to establish a many-to-one relationship between Treatments and Taxon Concepts. That is, a single Taxon Concept can have many Treatments (TNUs) as its "instances". Put another way: a Taxon Concept includes the set of all Treatments that are mutually congruent with each other (in a RCC5 sense). If I understand @jgerbracht correctly, this is the approach taken by Avibase as well.

Here is my question for @jgerbracht: How does Avibase mint these? Is there any algorithmic method for deciding when a new Avibase ID is needed? Or does that always require some sort of human decision?

For me, everything starts and ends with Taxon Concepts. Taxonomic Name Usages never come into it.

Alas, we part ways again in our thinking. I would say everything is founded on TNUs. All nomenclatural acts and all statements about taxa (concepts and otherwise) exist within TNUs. Without TNUs, we have nothing. However, with TNUs, we can do some pretty useful things in terms of deriving implied Taxon Concepts, which I think serve as a valuable abstract thing to which we can assign explicit identifiers, and use those identifiers to aggregate sets of TNUs that represent congruent Taxon Concepts. In other words, everything starts with TNUs, and Taxon Concepts are one of the (many) useful things you can derive from TNUs.

Let me state my opinion even more strongly: Unless they are anchored in TNUs, Taxon Concepts as units of information exchange are completely worthless, and have no discernable meaning.

I DO agree that the "name" part of TNUs is simply an interesting property helpful for constructing a text string label. The "name" part of TNU is only in there because for 250+ years science has relied on scientific names as proxies for Taxon Concepts. We do not want to confuse the name stuff with the concept stuff. But TNUs are informatic building blocks upon which any useful system of taxonomic information tracking should be based. Everything else is derived from TNUs.

Sorry for the long post! But it feels like we are making good progress here -- even if we don't fully agree on everything.

deepreef commented 2 years ago

Just a follow-up on this from @jgerbracht :

I'm coming around to the idea that we need a standard to support both Option 1 and Option 2 as both have advantages and disadvantages, as well as real world cases where they would be implemented and used. If we had a central authority for minting 'deep' taxon concepts, then I'd be strongly in the Option 2 camp, but that isn't reality, at least not in todays world.

I think the answer is that we create a really robust definition for TNUs, including well-defined subsets that represent Protonyms, and subsets that represent Treatments. Once we have that in place, then we need a robust definition for a "Taxon" (dropping the "Concept" part), which serves the same basic function as Avibase IDs, but anchored to something that is more objective and computable (reference what @Remsen and Nicolas and I are working on). Then we start building a network that maps TNUs to their corresponding Taxon. The nice thing about the approach @dremsen and Nicolas and I are taking is that all RCC-5 relationships are automatically self-evident from the definitive properties of the "Taxon" instances themselves (i.e., the Protonym Arrays). Many TNUs (i.e., those that include complete heterotypic synonymies) can be automatically associated with the corresponding Taxon instance. The remaining TNUs will need to be assigned to Taxon instances via accordingTo assertions (this also applies to tracking taxonomic "splits").

If we can achieve these things, then I think we're 90% of the way there. I don't think we need a separate class for TaxonName (I think these can be derived from Text-string properties of TNUs).

nielsklazenga commented 2 years ago

@jgerbracht, no problem, jump in anytime.

I've been in the field

So jealous.

This is what AviBase does right? Going from this:

avibird-circus-example-Page-3

to this:

avibird-circus-example-Page-2

to this:

avibird-circus-example-Page-1

Doesn't that make the AviBase IDs (grey ellipses) representative for, but of the same nature as, the Taxon Concepts (white ellipses) that are congruent with it (and are therefore mutually congruent)? So, Taxon Concepts?

I think for the role that AviBase plays, "managing and organizing taxonomic concepts" (Lepage et al. 2014, straight from the title), AviBase IDs can only be Taxon Concepts.

In order to align things, the things at both sides of the alignment need to be the same type (or class) of thing. You also need to realise that not all groups have something like AviBase.

If I take @camwebb's use case from the Flora of Alaska from the same (2020) TNC IG meeting that I took yours from, I would align the Taxon Concepts from older treatments as follows:

webb-erigeron-Page-1

[Please correct me if I got that totally wrong @camwebb]

As you can see, there are a few taxon concepts in there that cannot be aligned with confidence (not by me at least) and this results in four clusters of one, so we do not need to create a new "thing" to represent the cluster.

An additional complication is that the alignments here only work in the context of Alaska, while all the taxa are more widely distributed and some of the works that are cited might have a broader geographic scope than Alaska. So, the global picture might look different and therefore I would be hesitant to mint IDs.

So, this is what I end up with:

webb-erigeron-Page-2

Here, I have aligned individual Taxon Concepts (white ellipses) with the representation of a cluster of congruent concepts (grey ellips). Not everyone may want to do this, but it should be possible, right? A system that works for everyone will not be everyone's perfect system.

I see what AviBase does – after the alignment, which is the most important part – as deduplication, not creation of new things.

In an earlier meeting you've said that we need a name for the AviBase ID kind of thing. I totally agree and I do not think you'll find anyone who would argue with that. I have proposed to do that under taxonConceptCategory (#12).

Doing this means that people can manage this information any way they like, or what works best in the scope of there system. So you can have different tables in a database [so, definitely not suggesting that AviBase should throw the AviBase IDs on a heap with the concepts from the published taxonomies] and exchange them as different types or files (extensions), so de-facto subclassing them. We cannot do that officially in TCS, because, one, we do not do that in TDWG, but also, and more importantly, because a subclass needs to have at least one property that the superclass does not have, and I cannot think of any such property.

I think, by the way, that in order to fully accommodate the deduplication aspect of AviBase in TCS, we need a property – to link a Taxon Concept to an AviBase ID – not a class.

nielsklazenga commented 2 years ago

@deepreef, I only saw your last post just now and I am on my way home (or should be), so I only had a very quick read, but I think we are getting very close to (almost) full agreement.

I think our difference regarding the importance of TNUs are mainly because I think that TCS is totally not about nomenclature (or Names), it just accommodates nomenclatural rules and terms, and you obviously do not. This, I think, has a lot to do with the nomenclature codes we are familiar with. I have been talking about languages before and I think the different codes are even more different (the rules might be largely the same, but the way they get to them is worlds apart it seems). We should try to nut that out too, but not in this issue.

deepreef commented 2 years ago

I think our difference regarding the importance of TNUs are mainly because I think that TCS is totally not about nomenclature (or Names), it just accommodates nomenclatural rules and terms, and you obviously do not.

Whether or not TCS is "totally not about nomenclature" (if this is true, then there is no place in TCS for the TaxonName class), that is not the point. TNUs are the fundamental unit of all documented taxonomic information (Concepts, names, and everything else). Don't get bogged down on the "N" part of TNU. That's necessarily in there because scientific names have been the de-facto standard proxy for Taxon Concepts for 250+ years. You cannot have any information about Taxon Concepts without TNUs. That doesn't mean that Taxon Concepts depend on Taxon names (they don't). But Taxon Concepts only exist to the extent they are documented within Treatments (=TNUs). This has absolutely nothing to do with nomenclatural Codes. That is a completely different layer of information that also happens to be built on top of TNUs. We are in agreement that the nomenclatural Codes do not need to be part of the Taxon Concept discussion (but I think a good case can be made that they play some role in TCS).

nielsklazenga commented 2 years ago

@deepreef, could we continue this somewhere else (and some other time)? This is a bit of a tangent.

deepreef commented 2 years ago

could we continue this somewhere else (and some other time)? This is a bit of a tangent.

Fine by me. But I don't see how we can come up with a definition for Taxon Concept that does not rely on Treatments/TNUs (at least indirectly).

What is not a Tangent, and we still don't seem to have clarity on, is whether every Treatment should be considered as a unique Taxon Concept, with it's own Taxon Concept Identifier (my Option 1), or whether a Taxon Concept is defined in some other way, where all Treatments that are deemed to share the same concept are linked to Taxon Concept identifiers in a many-to-one relationship (my Option 2).

So far both I and @jgerbracht have indicated a preference for Option 2, but I gather that @nielsklazenga prefers Option 1 (every Treatment is considered a distinct Taxon Concept, and are clustered directly to each other via assertions of congruency). The current definition clearly represents Option 1 ("stated by a particular author in a particular publication").

What do others think?

nielsklazenga commented 2 years ago

Oh, not everything, just the last little diversion about TCS being about names or not (and yes, I started it). The rest is great.

So far both I and @jgerbracht have indicated a preference for Option 2, but I gather that @nielsklazenga prefers Option 1

I am just keeping my eyes on the ball, as well as rowing with the oars that we've got. We are looking for the best (written) definition for the TCS Taxon Concept, not for a new thing to define.

Also, in my view (and what I was on about in https://github.com/tdwg/tcs2/issues/1#issuecomment-1055143635), the TCS Taxon Concept accommodates both options 1 and 2. Going from option 1 to option 2 is a matter of deduplication – which is elimination of duplicate or redundant information. It creates a new Taxon Concept that represents a cluster of congruent (option 1) Taxon Concepts, not a Taxon Circumscription.

Just because option 2 Taxon Concepts have a one-to-one relationship with Taxon Circumscriptions does not make them Taxon Circumscriptions. Just like a one-to-one relationship between Treatment and Taxon Concept does not make Treatments Taxon Concepts. A Treatment – or the Taxonomic Article that contains it – is the accordingTo of a Taxon Concept.

So, yes, if we could have only one, I would choose option 1, if only because I cannot see how you can have option 2 type Taxon Concepts without having option 1 type Taxon Concepts. However, I do not think we have to choose.

deepreef commented 2 years ago

the TCS Taxon Concept accommodates both options 1 and 2. Going from option 1 to option 2 is a matter of deduplication – which is elimination of duplicate or redundant information.

I disagree -- they are fundamentally different "things". A single definition cannot capture both equally. Option 1 is the current definition at the top of this page. Except I would replace "publication" with "reference", get rid of the name part, and make it simpler. Maybe something like:

"An set of organisms, explicitly indicated and/or implied to exist, that are asserted by a particular static reference to be taxonomically homogeneous and collectively represent the entirety of a taxon."

A definition for Option 2 would remove the particular static Reference part:

"An set of organisms that are taxonomically homogeneous and collectively represent the entirety of a taxon."

A year ago I would have said Option 2 would have been a waste of time from an information standards perspective, because I could not conceive of a way to objectively define them. That has now changed.

I added the " and collectively represent the entirety of a taxon" part to clearly distinguish them from dwc:Organism.

I suspect that you won't like the "set of organisms" part, because it's too circumscription-y. But I don't think TCS should try to capture instances of imaginary taxa within the heads of humans; rather we need to capture units that can be represented informatically in a tangible way. I believe two different such units are clear in this space: 1) TNUs (which are sets of facts about taxonomic assertions that appear within documented references) 2) Taxa (=Taxon Concepts; which are abstract units independent of any particular TNU, that represent implied sets of organisms defined at a granularity that is both biologically meaningful and informatically practical, to which sets of taxonomic treatments [TNUs] can be semi-objectively assigned).

nielsklazenga commented 2 years ago

I suspect that you won't like the "set of organisms" part, because it's too circumscription-y

No, I do not think that is circumscription-y at all, it is pretty much the definition of 'taxon'. My whole point has always been that there is just as much circumscription in option 1 as in option 2 (or I maybe I should say there is no more circumscription in option 2 than in option 1). Your definitions do not change that. They do not say anything about circumscription.

Your definition of option 1:

A set of organisms, explicitly indicated and/or implied to exist, that are asserted by a particular static reference to be taxonomically homogeneous and collectively represent the entirety of a taxon.

applies equally to option 2. I do not understand how lumping multiple Taxon Concepts together can turn them into something else, or can remove something. Isn't the "set of organisms" etc. in option 2 the same "set of organisms" etc. in the option 1 Taxon Concepts that are congruent with it? And isn't it asserted by all the references that assert those option 1 concepts? Without those references, and without the option 1 Taxon Concepts, option 2 whatevers are just identifiers and therefore meaningless.

The "asserted by a particular static reference" is not in the definition of Taxon Concept, by the way.

deepreef commented 2 years ago

it is pretty much the definition of 'taxon'

That's part of the point. I much prefer the term "Taxon" to "Taxon Concept", and yes -- in this context I regard them as synonymous. I don't think I'm the only one who sees it this way.

Your definition of option 1 ... applies equally to option 2

I don't see it that way, because Option 2 is not associated with any particular static reference. The fact that I see them as fundamentally different, and you seem them as fundamentally the same, suggests we still haven't articulated our respective interpretations adequately.

Option 2 is not simply "lumping multiple Taxon Concepts together". The idea of Option 2 is that instances are not tied to Treatments, and could exist without any associated treatments. In practice, they will only get minted once at least one Treatment exists to represent them. While they can effectively (informatically) function as a "stack of congruent taxa as implied by treatments", that function is not part of the definition. No such stack needs to exist for an Option 2 Taxon [Concept]. It is not simply an exercise of mapping congruent Treatments together, because this idea/definition of a "Taxon" is as independent of individual treatments as it is from the scientific name used to label it. Those are non-definitive properties and linked objects.

Without those references, and without the option 1 Taxon Concepts, option 2 whatevers are just identifiers.

That's what I used to think, but I've come to realize they can exist as meaningful and computable entities on their own (not just identifiers).

The "asserted by a particular static reference" is not in the definition of Taxon Concept, by the way.

Yes, I know -- that was my rephrasing of "as stated by a particular author in a particular publication".

nielsklazenga commented 2 years ago

I much prefer the term "Taxon" to "Taxon Concept"

I think Taxon and Taxon Concepts are different things, but Taxon is already in Darwin Core too, so we cannot also use it in TCS. I think Taxon Circumscription is something else again and your option 2 is not it.

The idea of Option 2 is that instances are not tied to Treatments, and could exist without any associated treatments.

All Taxon Concepts can exist without associated Treatments. Treatments cannot exist without Taxon Concepts. You need to have a concept of a Taxon before you can write a Treatment. In fact, it is the option 2 things – which I think are also Taxon Concepts – that cannot exist without Treatments, as the option 1 type Taxon Concepts have to be there before the Treatment, but, until there is a Treatment, we cannot record them and we can certainly not align them.

jgerbracht commented 2 years ago

As Richard mentions, I to see a distinct difference between the TNU (Option 1) and the 'deep' Taxon Concept (Option 2). I also feel like Taxon is a better term for this, but Taxon has its own baggage and is often too wrapped up with a particular name usage and not the underlying set of organisms, still I think that would be a decent course even though it's not practical.

I see the option 1 as being more closely tied with how a particular reference treats a 'set of organisms', including the name (common and/or scientific), often the taxonomic rank and certainly other bits. Option 2 ('deep' Taxon Concept), represents the set of organisms, not a particular reference pertaining to that set of organisms.

I to, am curious about what others in the group think

ghwhitbread commented 2 years ago

TCS2 #taxonConcept

I’m with @Niels. It does not matter how one arrives at the formulation of a taxon concept. When it comes time to publish one, that will be a taxonConcept.

Our National Species List (auNSL), or more specifically the Australian Plant Census (APC), uses a similar (option 2) synonymic model for recognising potential change and in the alignment of APC concepts ( @jar398 too ). But, given that the APC is not doing taxonomy, we link to a published theory wherever possible, adding instances as required only for the purpose of nomenclatural aggregation. NSL concepts provide for a consensus arrangement of Australian taxa (and their labels) with the means to map across versions, aggregate distributions and attach taxonomic annotations. Published, they become taxonConcepts. Re-used they can be a nomenclator, a taxonomy, a thesaurus, a checklist, a treatment, a Flora or Faunal Directory.

Tcs:taxonConcept is not a taxon concept. It is an instance of the class of things that are attempts to communicate a theory about the identity of a taxon concept or to annotate, or change the circumscription of one. In our system they are a special kind of entity within the Taxon Name Usage graph. Elsewhere they might be deep concepts. I have seen them too, and many ask, in lists of names. In TCS they are a supposed to be a means for communicating data across our taxonomic systems and beyond.

The point is, that none of this matters. Here we have a thing called tcs:taxonConcept. We are chartered with the task of freeing it from the realm of models and agreeing on a vocabulary of properties that we can apply to those taxonomic things we care about and contribute to an open taxonomic knowledge graph. Starting from TCS in this group (from first principles, I hope in TNC), that should not be so hard.

TCS 101 UserGuidev_1.3.pdf “ ‹TaxonConcept> elements are used to represent real world taxa as published. They are the basic unit of taxonomic data exchange. Generally, whenever a scientific name is used a TaxonConcept is implied. All taxonomic opinion can be expressed using elements and the relationships between them. “

nielsklazenga commented 2 years ago

And I am with @ghwhitbread . Greg just beat me too it, so I am going to repeat a lot of what he said.

I do not really see the difference between options 1 and 2, but I think option 1 are supposed to be what TCS, the TDWG Ontology, Franz & Peet 2009, and the OpenBiodiv Ontology – and AviBase for that matter – call Taxon[omic] Concepts. This includes taxonomic treatments, entries in field guides, entries in checklists, entries in databases like Catalogue of Life, and clades in published cladograms. I see some solid differences between these things. [AviBase also calls the things it assigns AviBase IDs to – which I think are supposed to be option 2 – Taxon[omic] Concepts].

But, as Greg also said, "none of this matters". What matters is what we want to do with these things. If we want to classify them, align them and, most importantly, push them around with TCS, they have to be TCS Taxon Concepts. We cannot have two classes doing the same thing. So TCS Taxon Concept has to be a pretty big tent.

tcs:TaxonConcept is not a taxon concept.

Does anybody really think that dwc:Occurrence is really an occurrence?

As Greg already hinted at, 'Taxon Concept' is only a label. I think it is a good label, but, even if I did not, it has so much history and it has been used so often, that it would be crazy to change.

Here we have a thing called tcs:taxonConcept. We are chartered with the task of freeing it from the realm of models and agreeing on a vocabulary of properties that we can apply to those taxonomic things we care about and contribute to an open taxonomic knowledge graph.

Couldn't have said it better myself. That is exactly what we are supposed to be doing.