tdwg / tnc

Taxonomic Names and Concepts Interest Group
22 stars 7 forks source link

More appropriate name for TaxonRelationshipAssertion class #48

Closed nielsklazenga closed 4 years ago

nielsklazenga commented 4 years ago

We have adopted the TaxonRelationshipAssertion element from TCS, but have never really discussed the name of the class.

In the discussion around vernacular names (#47), the class was frequently mentioned and @jar remarked that it is not appropriate to call it "TaxonSomething", as they are relations between Taxonomic Name Usages.

It seems logical to call the class TaxonomicNameUsageRelationshipAssertions, but we restricted the type of relationships in this class to the horizontal relationships between taxon concepts (isCongruentTo, includes, isIncludedIn, overlaps, intersects and excludes) and there are several relationships between TNUs in the standard that are not in this class.

It would be nice to come up with a name for the class that reflects the types of relationships we allow in it. Any suggestions? I would prefer to not call it RCC-5 Relationships (if only because we have a term that is not in RCC-5) or Set Relationships.

nielsklazenga commented 4 years ago

On 3/15/20 10:07 PM, Richard L. Pyle wrote:

TaxonRelationshipAssertion

Aside:

These are relationships between TNUs, not taxa, so a "TaxonSomething" name is not appropriate.

There are already similar infelicities out there, such as taxonID, but propagating the confusion does not help.

Originally posted by @jar398 in https://github.com/tdwg/tnc/issues/47#issuecomment-599312338

nielsklazenga commented 4 years ago

Aside: These are relationships between TNUs, not taxa, so a "TaxonSomething" name is not appropriate. There are already similar infelicities out there, such as taxonID, but propagating the confusion does not help.

I guess that depends on how you define a TNU. I personally feel that a TNU instance is (by far) the best data object to use as a representation of a taxon concept -- especially in the context of an instance of a Taxon[omic]RelationshipAssertion. I know others wish to create a separate object for "TaxonConcept", but I have yet to see any actual implementation of such (other than using a TNU instance as a representation of a taxon concept) that doesn't create as many problems as it solves. But I'm certainly keeping an open mind on this! And I've almost been persuaded once or twice.

Now, keep in mind that only a subset of TNU instances can effectively serve as representations of Taxon Concepts, so in that sense I support "TaxonomicNameUsage" as a term. But I would argue that any TNU instance that participates in a Taxon[omic]RelationshipAssertion instance is, by definition drawn from among the subset of TNUs that represent concepts. Therefore, I think that the term TaxonRelationshipAssertion is entirely appropriate.

Of course, reasonable minds may disagree... :-)

Originally posted by @deepreef in https://github.com/tdwg/tnc/issues/47#issuecomment-599319871

nielsklazenga commented 4 years ago

That was my thinking when I kept the name TaxonRelationshipAssertion, so I agree. It might still be good to see if we can find a name that avoids the 'Taxon' bit. Otherwise, we'll just have to explain it very well.

jar398 commented 4 years ago

I agree with about 95% of what Rich said so I probably communicated poorly. I think naming is important because so many people will come to these data sources who haven't seen the specialized frameworks or standards before, and will be too busy to read up on them; the symbols (words, tokens) we choose ideally steer them away from bad interpretations and toward better ones (although in the end some people will figure it out from the data itself, regardless of what tokens we use). When a typical software engineer sees that something, call it x, is a 'FooRelationship', they will reflexively assume x is a relationship between Foos. A 'TaxonRelationship' would have to be a relationship between taxa. So a better formation here might be 'TNURelationship' or 'TnuRelationship' since we are talking about relationships between TNUs (as I understand it). Maybe there is a more euphonious choice, I don't know. 'TaxonomicRelationship' might at least not be misleading, but I find the adjective 'Taxonomic' to be vague and unhelpful since there is no agreement on what taxonomy is.

In TDWG-land we repeatedly get tangled up in semiotic or 'intentional' questions that are very difficult for most people to reason about (taxon, taxon concept, taxon name usage, namestring, etc.). We never seem to be sure exactly at what point we step away from the domain of nature (e.g. specimens) into the domains of words (literature, data files) and from there to that of human brains ('concepts'). As a result each 'standard' that comes along is confusing and admits inconsistent application, so demands to be updated or replaced in a few years. The central problem unique to TDWG, besides the metatheoretic one, is overwhelming plurality of interpretations and of scientific hypotheses, and this is a very difficult thing to manage. There are very good methods (model theory being the best) for getting untangled but they can be quite subtle and almost no one who needs them is trained in them. There are a few ways forward: 1. talk about these things A LOT, hours and hours, so that a common understanding can be established (at least among the people who are talking), 2. refer to published semiotic theories (I don't know of any powerful enough for the TDWG situation), or 3. use formal methods to constrain intended meaning (this is the OWL / BFO / model theory dogma - don't just list terms, but write out rich formal axioms that constrain their interpretations). For this group none of these is viable, so we suffer.

I get pretty hot-headed about this and don't like to take people's attention away from possibly more important issues, especially when my chances of being helpful are low. I am thinking about how I might write a blog post or something to lay out my concerns.

mdoering commented 4 years ago

We have adopted the TaxonRelationshipAssertion element from TCS, but have never really discussed the name of the class.

What about TNUConceptRelationship? These are not generic relations and isn't anything an assertion so its superflous to call it that? I agree with @jar398 though that this screams for some other class to be called TNUConcept. TNUCongruency?

deepreef commented 4 years ago

@jar398 : OK, good to hear -- I also agree with about 95% of what you say -- particularly in terms of using terms that are precise and accurate! And I think this is a very helpful topic of discussion worthy of some time, because it's helping me clarify my own ideas, and I think is helping all of us in understanding each other.

My agreement level with you on your second paragraph is a solid 100%, in that the single greatest frustration for me over all my years in TDWG-land is how sloppy we tend to be about conflating meanings of words like "taxon", "concept", "name", etc. between the realm of biological entities, and the realm of data entities. And even within a defined realm, with 10 biodiversity informaticians in a room discussing this stuff, there will be 12 different definitions/interpretations of each word.

While it is sometimes useful in such circumstances to hammer down very specific terms with very precise meanings, I don't think we're quite ready to do that within the TNC part of TDWG-land. We actually tried to do this during the original TCS development, and started a glossary of terms along the lines of "canonicalNameString", "completeScientificNameString", "completeScientificNameStringWithAuthorship", etc. (not exactly these terms, but ones like them), and of course it went nowhere and not even the people enthusiastic about developing the glossary used the terms consistently.

So now my thinking is that we go the other direction and become more lumpers in our terminology -- at least until we're calibrated at the coarse granularity. In other words, rather than trying to establish subtly different meanings for slightly different terms, which we'll probably get out of synch on pretty quickly, I think we should operate with TNC terms at the coarse level. In that context, I would submit the following:

Taxon=Taxon Concept=Taxon Circumscription A set of biological entities, alive, recently dead and yet to be born, asserted to comprise a collective unit in nature to which a scientificName is assigned.

TaxonNameUsage=TNU A data entity representing an assertion regarding a scientificName, documented in some form of Reference.

These should obviously be refined somewhat, but my point is that for our discussions here, the words Taxon, Taxon Concept and Taxon Circumscription should all be treated as congruent synonyms (for now, at least), and be understood to apply to an asserted set of biological entities (biological specimens, organisms in nature, etc.). When we refer to TNUs, we're referring to the data entities (data objects with defined properties and identifiers and whatnot) that serve as proxies for such assertions about the taxa.

So...I guess my point is that in the context of the above, I think that TaxonRelationshipAssertion is precisely what I believe this class of object is intended for: that is, an assertion about the nature of a relationship between two taxa. Therefore, I think the class name is exactly what it should be.

My thinking is this: A subset* of instances of TNUs represent proxies for taxa (=taxon concepts). They're only "proxies", because the actual taxon concepts consist of a circumscribed set of biological entities in nature, asserted by someone to represent a collective biological unit. TNUs represent the digital proxies for those assertions of taxon concepts/circumscriptions, so that we can refer to the set of biological entities that they represent in a digital/database form.

[*Note: I say a "subset" because not all TNUs serve as proxies to taxon/concept/circumscription assertions. Many TNUs are simply name-usage instances without asserted taxon concept circumscriptions.]

In my mind, at least, the entity we're trying to represent through the TaxonRelationshipAssertion class is a meta-assertion (an assertion about assertions). That is, we want to characterize the nature of the relationship between one asserted taxon concept/circumscription, and another asserted taxon concept/circumscription, as asserted by someone else. We use TNU instances as the anchor points (proxies) for each of the two asserted taxon concepts/circumscriptions participating int he asserted relationship between them. However, we're only using TNUs in this context as proxies to the asserted taxon concepts. In other words, the "relationship" is not between two TNUs, its between the taxon concepts/circumscriptons that those TNUs are intended to represent. That's why I don't think something like TNURelationshipAssertion would be accurate, because the actual RCC-5 Relationships refer to the biological entities, not the data entities that serve as proxies for the biological entities.

Whew... OK, that rant went on WAY longer than I intended. But here's the thing: I completely agree with @jar398 that this stuff is WAY too subtle to be fully understood by 99% of the people who will be using this standard. Unfortunately, there's simply no way around that. I also agree that we should use terms that minimize the opportunity for people to confuse things. So, here's what I propose:

Instead of changing the name of the class "TaxonRelationshipAssertion" (which, as argued above, I believe to be accurate if we go with the coarse definitions), we should focus on the terms used to reference the "to" and "from" entities. TCS has them as ToTaxonConcept and FromTaxonConcept. I'm not sure what the latest terms are for these properties within our discussion context here, but I would recommend something like "toTaxonId" and "fromTaxonId". These values could be populated either by TNUIds (if we accept that a subset of TNUs serve as proxies for taxon concepts), or they could be populated by identifiers for some sort of Taxon Concept entity (there be dragons, but perhaps worth exploring, or at least leaving the door open for)

OK, I've rambled on enough -- my sincere apologies. But I think this stuff is important to get right.

mdoering commented 4 years ago

Sounds solid to me, Rich! Fully agree with Taxon=TaxonConcept=Taxon Circumscription

I just dont quite understand why we want the term "Assertion" in there. Isn't a taxons classification an assertion and many other properties too? Why not just TaxonRelationship?

deepreef commented 4 years ago

Thanks, @mdoering ! I think the word "Assertion" is helpful in this context because it actively reminds people that we're talking about things people are claiming (asserting), rather than some sort of objective fact. While it's technically true that pretty-much all of our data entities are "assertions" in some form, I think it's helpful to hammer this home in the context of taxonomy, because so many people (even in our world) ascribe too much "fact" to taxonomic stuff.

Indeed, when I first described what we now refer to as "Taxonomic Name Usages", I used the word "Assertion". I abandoned that term only after TCS came up with "RelationshipAssertion". My personal feeling is that maintaining the word "Assertion" is helpful for TaxonRelationshipAssertion because, if nothing else, it reminds people how important the "accordingTo" property is.

But I'm not strong in my opinion on this -- I won't push back if the consensus is to remove the word "Assertion".

jar398 commented 4 years ago

@deepreef Sorry but I think you misrepresent me - I don't set the bar at high precision - not at all. I set it at being (a) logically coherent and (b) resonant with the way words are used in ordinary language (ideally over the past century or so) and in our community. I think the 'lumping' you describe is a very bad idea because it contradicts both of these goals.

This is really hard and I am trying to sidestep the mess, but you are forcing my hand. I will take some time to collect my thoughts and to research how some of these terms have been used in the past, so if you can, hold off a day or two before making these decisions...

jgerbracht commented 4 years ago

Maybe I have a misunderstanding of what a Taxon Circumscription is. I'm not a taxonomist so please bear with me. I thought a Circumscription was the same as a TNU, what an authority called an asserted group of organisms and how that group is related to other groups. To me the Taxon Concept is simply the asserted group, the various names and relationships (other than parent/child relationships) that authorities apply to that asserted group are TNUs. Richards Taxon Concept definition in that it is 'A set of biological entities, alive, recently dead and yet to be born, asserted to comprise a collective unit in nature to which a scientificName is assigned.' works for me (I think).

deepreef commented 4 years ago

Hi @jar398 -- I'm sorry I made it seem like I was representing the precision thing as your position. That wasn't my intent! I only provided all that stuff about precision as context to setting the conversation to a coarse granularity (i.e., Taxon=Taxon Concept=Taxon Circumscription). I wasn't intended to suggest you were advocating for high precision!

I understood you were after logical coherence, but to express my perspective on that, I felt the need to step back a bit to make it clear that when I refer to "taxon" or "taxon concept" or "taxon circumscription", I mean these as congruent synonyms, and specifically with reference to biological entities in nature.

We obviously seem to be miscommunicating somehow, and I apologize for my role in that. If I understand you correctly, you disagree with the notion that "lumping" the words "Taxon", "Taxon Concept" and "Taxon Circumscription" to have the same meaning, then perhaps it would be helpful if you could articulate how these three terms differ in their meaning (or should be thought of as different in meaning).

deepreef commented 4 years ago

Hi @jgerbracht : My understanding of these words probably differ from others, and I'm sure I've not always been consistent (10 taxonomists in a room, 12 definitions for the word). However, I think of a "Circumscription" is a set of organisms in nature, including living, recently dead, and yet to be born. In the context of this conversation, I am using the terms "Taxon Concept" and "Taxon" to mean the same thing.

In my view, the common practice is that a circumscribed set of organisms (representing a taxon or taxon concept) are implicitly or explicitly indicated (e.g., via enumerated specimens or populations of organisms, sets of characters used to diagnose taxa, or a topology of type specimens asserted via heterotypic synonym) by people through some form of documentation, and are usually labelled with a scientific name of some sort.

Such instances of documenting explicitly or implicitly indicated circumscribed sets of organisms as "taxa" (i.e., taxon concepts) are captured in a structured data entity that we're called a "Taxon Name Usage" instance (TNU). My position is that a subset of these TNUs can serve as functional proxies to represent a taxon/concept/circumscrption.

There are many TNUs that do not explicitly or implicitly assert a taxon circumscription associated with a scientific name, so not all TNUs can serve as proxies for circumscriptions. But the subset of TNUs that do include a taxon circumscription assertion can, in my opinion, serve as very useful representations of such.

jliljeblad commented 4 years ago

This name discussion reminded me of the Avibase model (Lepage et al, 2014) where they distinguish between "shallow" and "deep" taxonomic concepts. The first seems to be the same as TNU's with unique name/source combinations (concept labels, as they say) while the second is described as taxonomically unique (non-congruent) concept clusters - what we here call Taxon Concepts (but they ultimately refer to as Avibase ID).

Since someone saying Taxon Concept might rather be meaning TNU, it seems we really need to use terms carefully here.

mdoering commented 4 years ago

Indeed @jliljeblad. And we face the same shallow / deep options on the name side actually. Either there is a "deep" unique name instance that we link name usages to or we select the "shallow" protonym as the original name usage that has established a name to represent that name.

deepreef commented 4 years ago

I'm going to need more elaboration on the distinction between both "shallow" and "deep" for both names and concepts. The Avibase ID mentioned by @jliljeblad sounds suspiciously like a TNU in the form of a TNU "accordingTo" Avibase (or perhaps a set of TaxonRelationshipAssertions "accordingTo" Avibase), and I'm a bit lost on the shallow vs. deep name mentioned by @mdoering.

I have tried to be careful (at least lately -- maybe not in prior discussions) to consistently say that a TNU is not a concept per se, but is representative of, or a proxy for, a taxon concept. TNUs exists as records in a database and refer to assertions about names (usually scientific names) that appear in References. Taxon Concepts exist in the minds of people and are expressed or communicated as assertions about explicit or implicit sets of organisms in nature. These concepts are usually documented within "Treatments" (which are informatically a subtype of TNUs), which is why TNUs serve as a very convenient and robust proxy for a Taxon Concept.

At least that's how I think of it.

cboelling commented 4 years ago

There are a few ways forward:

  1. talk about these things A LOT, hours and hours, so that a common understanding can be established (at least among the people who are talking),

  2. refer to published semiotic theories (I don't know of any powerful enough for the TDWG situation), or

  3. use formal methods to constrain intended meaning (this is the OWL / BFO / model theory dogma - don't just list terms, but write out rich formal axioms that constrain their interpretations).

For this group none of these is viable, so we suffer.

@jar398 I don't mean to distract from dealing with the issue at hand but why do you think that none of the ways forward you have mentioned are viable for this group? Particularly number 3 (OWL/BFO/model theory)?

mdoering commented 4 years ago

which is why TNUs serve as a very convenient and robust proxy for a Taxon Concept.

Proxy is exactly my understanding of shallow. You select a TNU to represent something more abstract, the concept or the name. Maybe only "treatments" as a more factual subset of TNUs work that way.

deepreef commented 4 years ago

@mdoering:

Proxy is exactly my understanding of shallow. You select a TNU to represent something more abstract, the concept or the name. Maybe only "treatments" as a more factual subset of TNUs work that way.

OK, thanks -- that makes sense.

jliljeblad commented 4 years ago

@deepreef:

The Avibase ID is rather a central taxon concept to which concepts from a multitude of bird checklists are mapped. If anything, you could make an argument that there are both TNU's and taxon concepts in each of these checklists, but the distinction between the usage of a preferred name in such a checklist and the corresponding taxon concept is probably only going to complicate things unnecessarily.

jgerbracht commented 4 years ago

The Taxon Concept which both Denis and I use in the world of birds is essentially a identifier for a group of organisms which is currently recognized or has been recognized as a collective unit in nature, to paraphrase Richards definition. I tend to work better with examples so I'll give a couple which I hope will clarify deep vs shallow or TC and TNU as I understand them using the current version of two different global avian taxonomies (BirdLife and Clements).

Short-tailed Emerald (Chlorostilbon poortmani) is recognized by both BirdLife and Clements but each authority applies this name to different Taxon Concepts, i.e. different groups of organisms. Birdlife includes the populations in the Colombian and Venezuelan Andes as well as mountain ranges in N Venezuela. Clements includes populations in the Colombian and Venezuelan Andes only

Green-tailed Emerald (Chlorostilbon alice) is recognized by Clements as the population in N Venezuela and this same group of organisms is recognized by BirdLife as a subspecies C. p. alice.

This example has 3 concepts,

  1. the populations in the Colombian and Venezuelan Andes
  2. the population in the mountain ranges in N Venezuela
  3. the populations in the Colombian and Venezuelan Andes and mountain ranges in N Venezuela (and can be considered parent of concepts 1 and 2).

and 4 TNUs Short-tailed Emerald (Chlorostilbon poortmani) assigned to concept 3 (BirdLife v4) Short-tailed Emerald (Chlorostilbon poortmani) assigned to concept 1 (Clements 2019) Green-tailed Emerald (Chlorostilbon alice) assigned to concept 2 (Clements 2019) Chlorostilbon poortmani alice assigned to concept 2 (BirdLife v4)

I think of a TNU as name/concept combination as designated by some authority. Taxon Concept is a proxy for a set group of organisms (past, present and future) and is 'static' through time.

I would disagree that this complicates things unnecessarily, for me it's key to managing large biodiversity datasets, which I can elaborate on later (must run to take my daughter out)

jliljeblad commented 4 years ago

@jgerbracht:

Sorry, I only meant that they seemed to complicate things when I was trying to explain how your Taxon Concept relate to the Avibase ID, which is the central element of Avibase to which all other concepts are related.

So, basically your example has 4 TNUs, 3 concepts but also 3 Avibase IDs (each congruent with one of the concepts).

jgerbracht commented 4 years ago

@jliljeblad Ah, understood they do seem to complicate things. Yes, In essence, an Avibase ID is a Taxon Concept ID

jgerbracht commented 4 years ago

Sorry to have derailed the conversation a little bit. Back to the relationship assertion class. It seems to me that this is detailing relationships between TNUs and not detailing relationships between concepts or between a concept and a TNU. Is that how other are thinking of this relationship class? If so, TNURelationshipAssertion might work??

deepreef commented 4 years ago

@jgerbracht : Instances in the [...]RelationshipAssertion should absolutely be representing relationships between two concepts, accordingTo some Reference. The RCC-5 Relationships should apply to the two sets of circumscribed organisms implied by each taxon concept (assuming Taxon Concept = Taxon Circumscription). To me, that should remain as is.

What I think we should be debating is: How should the two concepts be represented (i.e., cited) within each [...]RelationshipAssertion instance?

At the moment, we've been talking about representing the two concepts (i.e., the "toConcept" and the "fromConcept") using TNU instances, under the assumption that a subset of all TNUs (i.e., those that are "Treatments") can serve as reliable proxies for taxon concepts. This has stirred up some discussion about whether a TNU is a Taxon Concept, or not. I think it's better to say that a TNU is not a concept per se, but is a very robust proxy for a Taxon Concept.

The alternative is to establish some other entity besides a TNU that more directly represents a Taxon Concept. The most plausible approaches to this have been to establish a dedicated "TaxonConcept" entity to which multiple TNUs can be mapped. The entity itself is independent of the scientificName applied to it, and is intended to represent a cluster of congruent concepts (very-much like how Avibase seems to do it). However, there are many practical problems doing this, not the least of which are: "According to whom do the cluster of TNUs represent congruent concepts?" Essentially, the clusters of congruent TNUs need to be mapped to each other via what amounts to [...]RelationshipAssertion instances, which brings us back to the question of whether TNUs "are" Taxon Concepts.

And there would be a hybrid approach which you alluded to as:

relationships [...] between a concept and a TNU

The idea of this would be to have the "fromConcept" part point to a TNU, and the "toConcept" part point to a "TaxonConcept" entity.

My feeling is that, unless we're prepared to spend some serious time defining a TaxonConcept entity (perhaps modeled after the Avibase approach) -- and as I said before, "here be dragons" -- then the cleanest/simplest way forward, I think, is to accept TNUs as proxies for Taxon Concepts. In any case, I firmly think that we should maintain the "[...]RelationshipAssertion" class as supporting asserted relationships between two Taxon Concepts, via one of the RCC-5 relationship types.

deepreef commented 4 years ago

@jgerbracht :

I think of a TNU as name/concept combination as designated by some authority. Taxon Concept is a proxy for a set group of organisms (past, present and future) and is 'static' through time.

I would disagree that this complicates things unnecessarily, for me it's key to managing large biodiversity datasets, which I can elaborate on later (must run to take my daughter out)

Yes, I agree completely that this would be how a Taxon Concept entity would need to be defined. The problem comes into play when you get into the details of how these static concept entities are generated. The very act of establishing them as static abstract entities (=defined sets of organisms in nature) requires someone to declare them as such. In other words, you need an "accordingTo" to go with each declared TaxonConcept instance.

TNUs are also static, but come pre-bundled with a cloud of data (in the form of a Treatment) to help clarify the boundaries of the circumscribed set of organisms -- which is why they make perfect proxies for Taxon Concepts.

Following your example, who is to say that these two TNUs represent the "same" concept:

Green-tailed Emerald (Chlorostilbon alice) assigned to concept 2 (Clements 2019) Chlorostilbon poortmani alice assigned to concept 2 (BirdLife v4)

Sure, both may refer to the population in the mountain ranges in N Venezuela, but what if one of them also included a range extension with a few individuals observed at the northern tip of Columbia? Does the addition of these individuals mean it's a different, slightly more inclusive TaxonConcept? Or would we assume it's the same TaxonConcept, except with a slightly broader geographic distribution? What if later studies with DNA show that the N Columbian population is a a very distinct relic, but still most closely allied with the the N Venezuelan population? There are a million possibilities, and it's really hard to come up with objective criteria to determine whether two slightly different circumscribed sets of organisms represent the same TaxonConcept, or two slightly different ones.

That's why anchoring TaxonConcepts to specific TNUs (and establishing the latter as a proxy to the former) ensures truly static representations of TaxonConcepts, usually with built-in clouds of data (Treatment) to help others assess the appropriate RCC-5 relationship with other TNUs.

None of the possible solutions are simple. But some are simpler than others.

deepreef commented 4 years ago

Note: The example I gave of subtly different range extensions may be rare in birds (and mammals), but they are the absolute norm for essentially all other organisms (including other vertebrates, like fishes).

jgerbracht commented 4 years ago

I'll start at the bottom, regarding anchoring TaxonConcepts to specific TNUs, I have always thought that a TaxonConcept must have an 'originating' TNU, that provides its definition, this should somehow provide or link to a range description and distinguishing characteristics (including genomic data). But many future TNUs can be applied to the same Taxon Concept.

For range extensions, to me that is part of the "and yet to be born" portion. Ranges are always in flux and changes in the range of a taxon don't change the taxon, only where the taxon live. So following that, a range extension does not change the TaxonConcept, it's still the same concept. In your DNA study example, that does in fact fundamentally change the situation and a new TaxonConcept would need to be defined or resurrected if it existed in the past. When I said 'static', I meant that the group of organisms is 'static' in the sense that if multiple authorities apply different names, taxonomic levels and different relationships to other taxa, that group is still the same TaxonConcept.

The alternative that you mention, "The alternative is to establish some other entity besides a TNU that more directly represents a Taxon Concept." This is exactly what I think we need, at least for domains where it can be utilized, otherwise the mapping of TNUs to other TNUs, when there may be hundreds for each taxon would be a nightmare. I don't think the issues are insurmountable and the benefits would be massive. Between the groups already doing this sort of work for birds, I think we are already close and can see a path to making this a reality. I'm hoping that the standard we develop will, at the very least, define a TaxonConcept as underlying TNUs, even if this iteration of the standard does not directly define the concepts.

I'd also like to make sure we are also on the same page, you mentioned the avibase ID as appearing to "represent a cluster of congruent concepts" and therefore this cluster of concepts needs relationships. I guess I'm thinking that the avibase ID does not represent a cluster of congruent concepts, but in fact uniquely represents a single concept so the only relationships at the concept level would be parent/child only (i think). In the 'emeralds' example, concepts 1 and 2 are both children of concept 3. If subsequent research shows the concept 1 is actually not closely related to concept 2, then new concepts would be created as necessary, but concept 3 will always remain as the parent of concepts 1 and 2, even though 3 is now deprecated. This does mean that concepts can have multiple parents, i.e. concept 1 and 2 combined = concept 3, and after the subsequent research, concept 1 and some other concept 6 combined = concept 7. As for who creates and manages these concepts, it must be the groups coining the TNUs, i.e. the taxonomic authorities, utilizing a centralized clearinghouse of concepts. In the domain I know, where there are 4 or 5 distinct groups, each managing a global taxonomy of birds, they are all well aware of these underlying concepts, though not by that name, and how other groups have treated them in specific versions of their taxonomies so I don't think this would be unsurmountable. Avibase could potentially be the repository or clearinghouse for taxon concepts related to birds. Again, I don't think this group can take on TaxonConcepts as we're discussing in this current round, but I would like to make sure what we produce will move us in that direction.

jgerbracht commented 4 years ago

Re the range extensions and contractions, this is also the norm in birds

deepreef commented 4 years ago

I'l keep this as short as possible...

I have always thought that a TaxonConcept must have an 'originating' TNU, that provides its definition, this should somehow provide or link to a range description and distinguishing characteristics (including genomic data). But many future TNUs can be applied to the same Taxon Concept.

Yes, in this sense a designated TNU can server as the analog of a "type specimen" for a taxon concept. However, the mechanics are similar: one cold link other congruent-concept TNUs to this "type" TNU via [...]RelationshipAssertion instances, without the need for a TaxonConcept class or dedicated instances.

Ranges are always in flux and changes in the range of a taxon don't change the taxon, only where the taxon live. [...] When I said 'static', I meant that the group of organisms is 'static' in the sense that if multiple authorities apply different names, taxonomic levels and different relationships to other taxa, that group is still the same TaxonConcept.

So... TC1=N Venezuela. Later range extension in N Columbia doesn't change TC1. Later DNA analysis removes individuals from TC1 and puts them within TC2. Or does it? Maybe the DNA evidence is borderline so it's really all still just TC1? The point is that there are no objective criteria to distinguish between cases when a TC inflates to accommodate more individual organisms, or when a subset of individuals once included in the TC are cleaved to create a new TC. In some cases people would consider these to be misidentifications, so the original TC remains intact. In other cases, the original TC (TC1) is split to form TC2 and TC3 (Where TC1 includes TC2+TC3, and TC2/TC3 are non-overlapping). This gets really messy really easily.

The alternative that you mention, "The alternative is to establish some other entity besides a TNU that more directly represents a Taxon Concept." This is exactly what I think we need, at least for domains where it can be utilized, otherwise the mapping of TNUs to other TNUs, when there may be hundreds for each taxon would be a nightmare.

I worry that both alternates represent nightmares. My gut says that TNU mappings to TNUs will ultimately yield fewer nightmares, because TNU instances are themselves packets of facts, and RelationshipAssertion instances are themselves asserted opinions (with an explicit "accordingTo"), but TCs as a defined entity unto themselves are not clearly on either side of the objective/subjective domain. I think the answer may be to keep them all as TNU-TNU relationships, but have a mechanism for branding one TNU as a "type"-like anchorpoint around which other TNUs are clustered (as you suggested).

I'm thinking that the avibase ID does not represent a cluster of congruent concepts, but in fact uniquely represents a single concept so the only relationships at the concept level would be parent/child only (i think).

Maybe... But I've yet to see this actually work. I think the Avibase model has the best chance of demonstrating practical success, but to be scaleable it would need to apply to all taxa. I think that's where it will get tricky.

As for who creates and manages these concepts, it must be the groups coining the TNUs, i.e. the taxonomic authorities, utilizing a centralized clearinghouse of concepts.

The TNUs are born in literature -- there are no groups that coin them. There are groups who digitize them and add them to databases, but I think there would need to be a mechanism for people to define which among potentially hundreds of TNUs would best represent the "type" TNU.

jgerbracht commented 4 years ago

I think we may be very close in our actual thinking here, if I'm correct, the basic difference between what you are suggesting and myself is whether the concept is an 'original' TNU or a concept is a separate object which has an original TNU assigned. The avibase model, which is also the basic model used within eBIrd, MacaulayLibrary and Birds of the World to manage annual taxonomic changes, works well. Details could be improved, but it works well and has allowed us to manage these large datasets where the underlying taxonomy changes annually. Using an underlying concept identifier, I can aggregate observations, photos, sounds and video with species life history articles and be confident they truly apply to the same taxon, even as the TNUs change. I agree that TNUs are born in literature and then they are adopted ( or not ) by the community (for birds, that community is mostly comprised by the groups managing the global taxonomies). Since I think mainly about the implications of how TNUs are used by museums, citizen science projects and data aggregators, I think of the adopters of a TNU as the ones to manage concepts, and for birds, they are the regional and global taxonomy groups. I'm sure that same thinking wouldn't necessarily apply to other taxa.

deepreef commented 4 years ago

@jgerbracht : Yes! That sounds about right to me! Assuming the terms "original TNU" and "'type' TNU" are the same (I like your term "original TNU" better), then what you describe closely matches my own thinking on this.

Similarly, the GNUB data model (modeled after Taxonomer, in the previously linked PDF) likewise works well, and is TNU-based (without minting new identifiers to represent concepts separate from the "original" TNU). So between these two approaches, I think we're close to convergence. We don't need to solve the optimum data model -- we just need to make sure the TNC exchange standard accommodates data with enough granularity and structure to allow sharing information losslessly.

The only part I'm little unclear on is this:

I agree that TNUs are born in literature and then they are adopted ( or not ) by the community

I'm not sure what you mean by "adopted". As TNUs, they exist independently of whether or not anyone adopts them. But if you mean "adopted" in the sense that a community will select specific TNUs as representatives of well-defined taxon concepts (i.e., "original TNUs"), to which other TNUs are mapped as congruent, then yes -- that is exactly the sort of thing a community should adopt. I think of it as putting little gold stars next to a particular TNU that, in the eyes of the community "got it right".

So, just to confirm the thinking is similar, here's my current thinking: An instance of [...]RelationshipAssertion consists fundamentally of four elements:

then we can propose "business rules" for how best to populate the values of these properties.

I'm still thinking in terms of the first two each being a pointer to an instance of TaxonomicNameUsage, the third being one of the RCC-5 relationships (congruent, included in, includes, overlaps, excludes), and the fourth being a pointer to an instance of Reference. In that context, the business rules could be something along the lines of the following:

For instances involving relationshipType='Congruent', the "toTaxonConcept" should be a TNU that is community-adopted as representative for a well-defined taxon concept (= the "original TNU"). The fromTaxonConcept can be any other TNU. This way, the designated "toTaxonConcept" TNU serves the functional equivalent of a "TaxonConcept Instance".

In cases where relationshipType is not congruent (includes, included in, overlaps excludes), then both toTaxonConcept and fromTaxonConcept should only be populated with the community-adopted TNUs.

The value of accordingTo would point to a Reference representing the who and when of the declaration for the relationship.

I can think of a bunch of legitimate exceptions to these business rules, but broadly speaking, this approach would allow us to avoid a near-infinite number of relationship assertions representing every single pairwise relationship between every single TNU, and would allow us to converge around a set of "original TNU" instances that serve as primary anchor points to Taxon Concepts, but without all the mess of trying to define another class of "thing" (with its own identifiers and properties and definition and instances and so on) for a "Taxon Concept" entity.

Are we still close in our thinking on this?

nielsklazenga commented 4 years ago

@deepreef, those business rules are for within a system, right and not in a standard?

I don't think it matters on which end of the isCongruent relationship the community-adopted TNU is; in fact, I think in most systems the community-adopted TNU would be the subject in the relationship, as it has the same accordingTo as the relationship assertions. Also, while for the other relationship types both sides of the relationship will have TNUs that represent these "deep"concepts that are inferred by the system (that's how "deep" those concepts are), only one of those can be adopted by the community at any given time. If it were my system, I wouldn't want to mint a TNU for a cluster of TNUs that are congruent among themselves, but not congruent with my own concept. I probably wouldn't even be interested what the relationships are between concepts that are not my own.

The problem with these "deep" concepts is that they only live within a system and don't survive exchange or being looked at from the outside. For eBird the AviBird "deep concept" is just a TNU (forgive me if I don't have the scope of the systems correct).

I think that with a minimum of relationship assertions we can infer all the other relationships (and all the "deep" concepts), but I don't think it matters which relationship assertions they are. So, if we assert that A == B, it doesn't matter whether we assert that A > C or B > C.

deepreef commented 4 years ago

@deepreef, those business rules are for within a system, right and not in a standard?

Yes! However, I imagine we ought to produce a document of "Best Practices" to accompany the standard, to explain how to make best use of the standard in a way that conforms to preferred conventions.

I don't think it matters on which end of the isCongruent relationship the community-adopted TNU is; in fact, I think in most systems the community-adopted TNU would be the subject in the relationship, as it has the same accordingTo as the relationship assertions.

Right! I had them backwards, I guess. But cardinality of these relationships should be discussed at some point.

Also, while for the other relationship types both sides of the relationship will have TNUs that represent these "deep"concepts that are inferred by the system (that's how "deep" those concepts are), only one of those can be adopted by the community at any given time.

We're talking about two different meanings of "adopted". In one sense, a specific TNU is "adopted" as the anchorpoint for a particular concept, regardless of whether that concept reflects the communities view of what is accepted as the "correct" taxonomy. In another sense, the community accepts a particular concept as "correct". The [...]RelationshipAssertion instances do not necessarily reflect this latter information. I think perhaps we can use the word "adopted" in the former sense, and "accepted" in the latter sense. Or do I misunderstand your point?

If it were my system, I wouldn't want to mint a TNU for a cluster of TNUs that are congruent among themselves, but not congruent with my own concept.

We're not talking about "mintining" any new TNUs here. My assumption is that we have a pool of existing TNUs that are deemed to represent congruent concepts, and the community "adopts" one of these existing TNUs as the "original" TNU (analogous to the type specimen of a species). In some cases new TNUs would need to be minted (when none already exists in literature to represent a particular concept or nomenclatural label for a concept), but I should hope such de-novo TNUs would be a rare exceptions. In most cases, we can draw from TNUs already extant in the literature.

I probably wouldn't even be interested what the relationships are between concepts that are not my own.

I'm the opposite. My own concept relationships are the ones I'm least interested in. I'm much more interested in how other people have mapped Concepts to each other, regardless of whether I agree with them.

The problem with these "deep" concepts is that they only live within a system and don't survive exchange or being looked at from the outside.

I think we are talking about different things here.

For eBird the AviBird "deep concept" is just a TNU (forgive me if I don't have the scope of the systems correct).

I'm not sure that's correct (I'll leave it to those more familiar with Avibase to comment). The TNUs I'm talking about are ones that already exist in literature (i.e., the existing taxonomic "facts").

I think that with a minimum of relationship assertions we can infer all the other relationships, but I don't think it matters which relationship assertions they are. So, if we assert that A == B, it doesn't matter whether we assert that A > C or B > C.

I don't follow. I certainly want to know how all the relationships have been asserted. So when confronted with C, I understand its relationship to A and B.

nielsklazenga commented 4 years ago

I think we are entering application profile land.

For eBird the AviBird "deep concept" is just a TNU (forgive me if I don't have the scope of the systems correct).

I'm not sure that's correct (I'll leave it to those more familiar with Avibase to comment). The TNUs I'm talking about are ones that already exist in literature (i.e., the existing taxonomic "facts"). @deepreef in https://github.com/tdwg/tnc/issues/48#issuecomment-600423223

I was saying pretty much the same thing you were saying earlier:

The Avibase ID mentioned by @jliljeblad sounds suspiciously like a TNU in the form of a TNU "accordingTo" Avibase (or perhaps a set of TaxonRelationshipAssertions "accordingTo" Avibase) @deepreef in https://github.com/tdwg/tnc/issues/48#issuecomment-599953799

An entry in Avibase, which is available online, is as much a "taxonomic fact" (or a taxonomic opinion or assertion to talk in terms we are both more comfortable with) as the TNUs in printed literature it is based on. We said in an earlier issue (#45) that we needed to be really broad-minded about what sort of reference the accordingTo could be (in that issue we were talking about det. slips). I really don't see how the Avibase ID reflects a deeper concept than any other TNU (that represents a taxon concept).

I just had a look at Avibird and it appears I was wrong about its scope. Nevertheless, Avibird might want to compare its concept with that of eBird and vice versa.

I think that with a minimum of relationship assertions we can infer all the other relationships, but I don't think it matters which relationship assertions they are. So, if we assert that A == B, it doesn't matter whether we assert that A > C or B > C.

I don't follow. I certainly want to know how all the relationships have been asserted. So when confronted with C, I understand its relationship to A and B. @deepreef in https://github.com/tdwg/tnc/issues/48#issuecomment-600423223

My point is that you don't need both A > C or B > C assertions to understand the relationship between C and A and B, either one will do. Again, I am saying the same thing as you when you propose business rules to 'avoid a near-infinite number of relationship assertions representing every single pairwise relationship between every single TNU'. The only difference is that I am saying it doesn't matter which of the assertions you make. I would probably make the same one as you, but we don't need a business rule for that.

nielsklazenga commented 4 years ago

BTW. When I say that we are entering application profile land, it doesn't mean that we shouldn't do it, but just that it is not necessarily part of a standard. We are probably actually talking more about different scenarios or use cases than application profiles and those were things we set out to do but disappeared into the background.

I would really like to spawn a new issue, but I don't know where to make the split.

deepreef commented 4 years ago

OK, that helps clarify things. Thanks!

nielsklazenga commented 4 years ago

There was a nice example – not that different from @jgerbracht's example here actually – in the Catalogue of Life symposium at Biodiversity_next, in a talk presented by Olaf Banki, but prepared by David Remsen. This example was about the African elephant.

The African elephant was first described, at the end of the 18th century, as Elephas africanus (Elephas africanus sensu Source1). Later it was renamed to Loxodonta africana (Loxodonta africana sensu Source 2). In the early 20th century, Elephas cyclotis was described from Cameroon (Elephas cyclotis sensu Source 3). This has been recognised as a subspecies of Loxodonta africana for most of the 20th century (so we have Loxodonta africana africana sensu Source 4 and Loxodonta africana cyclotis sensu Source 4), until DNA sequence data indicated recognition at the species level was warranted (Loxodonta africana sensu Source 5 and Loxodonta cyclotis sensu Source 5).

So we've got (at least) 7 Taxonomic Name Usages:

ID Label
1 Elephas africanus sensu Source 1
2 Loxodonta africana sensu Source 2
3 Elephas cyclotis sensu Source 3
4 Loxodonta africana africana sensu Source 4
5 Loxodonta africana cyclotis sensu Source 4
6 Loxodonta africana sensu Source 5
7 Loxodonta cyclotis sensu Source 5

and 3 different taxon circumscriptions (I use the vernacular names as labels):

ID Label
8 "African elephant"
9 "Bush elephant"
10 "Forest elephant"

There is a many-to-one relationship between TNUs and Circumscriptions, so you might link them like so:

ID Label Circumscription ID
1 Elephas africanus sensu Source 1 8
2 Loxodonta africana sensu Source 2 8
3 Elephas cyclotis sensu Source 3 10
4 Loxodonta africana africana sensu Source 4 9
5 Loxodonta africana cyclotis sensu Source 4 10
6 Loxodonta africana sensu Source 5 9
7 Loxodonta cyclotis sensu Source 5 10

Now I think opinionated systems like Catalogue of Life, Australian Plant Census (APC), Avibase etc. do create their own TNUs, as you want to be able to refer to something according to CoL or something according to APC. So there are two more TNUs (in fact, they are probably the only ones in CoL 2020):

ID Label Circumscription ID
9 Loxodonta africana sensu CoL 2020 9
10 Loxodonta cyclotis sensu CoL 2020 10

There will probably be no TNU in CoL 2020 for the "African elephant" (there might be in a previous edition), as the concept is not adopted, so I would prefer the "type" approach @deepreef suggests:

ID Label "Type" TNU ID
1 Elephas africanus sensu Source 1 1
2 Loxodonta africana sensu Source 2 1
3 Elephas cyclotis sensu Source 3 10
4 Loxodonta africana africana sensu Source 4 9
5 Loxodonta africana cyclotis sensu Source 4 10
6 Loxodonta africana sensu Source 5 9
7 Loxodonta cyclotis sensu Source 5 10
9 Loxodonta africana sensu CoL 2020 9
10 Loxodonta cyclotis sensu CoL 2020 10

(@deepreef, forgive me if I got you totally wrong).

Using Taxon Relationship Assertions, you would get something like this:

ID subjectTaxonomicNameUsage relationshipType objectTaxonomicNameUsage according to
11 Elephas africanus sensu Source 1 [1] isCongruentWith Loxodonta africana sensu Source 2 [2] CoL 2020
12 Elephas cyclotis sensu Source 3 [3] isCongruentWith Loxodonta cyclotis sensu CoL 2020 [10] CoL 2020
13 Loxodonta africana africana sensu Source 4 [4] isCongruentWith Loxodonta africana sensu CoL 2020 [9] CoL 2020
14 Loxodonta africana cyclotis sensu Source 4 [5] isCongruentWith Loxodonta cyclotis sensu CoL 2020 [10] CoL 2020
15 Loxodonta africana sensu Source 5 [6] isCongruentWith Loxodonta africana sensu CoL 2020 [9] CoL 2020
16 Loxodonta cyclotis sensu Source 5 [7] isCongruentWith Loxodonta cyclotis sensu CoL 2020 [10] CoL 2020
17 Loxodonta africana sensu CoL 2020 [9] isIncludedIn Loxodonta africana sensu Source 2 [2] CoL 2020
18 Loxodonta cyclotis sensu CoL 2020 [10] isIncludedIn Loxodonta africana sensu Source 2 [2] CoL 2020

I think these three different ways of putting it all contain exactly the same information and can be easily transformed into each other.

There might be great value for a particular system to have these identifiers that are completely independent on the name string and the according to, and that depend only on the circumscription, as these are your real taxa, right? I just don't think they can be exchanged, as when you take away the TNUs, you are left with only an identifier and different systems will have different identifiers for the same "thing". So, if you have them, I think it is best to keep them inside your system and use business rules like those suggested by @deepreef to expand them into a minimum set of Taxon Relationship Assertions that allows people to reconstruct them for exchange.

By the way, 'circumscriptions' was what @deepreef suggested (in issue #1) to be a more appropriate term to use for these shared (or "deep") concepts than Taxon Concept. I prefer that, not so much because of the semantics, but because the TCS Taxon Concept is our Taxonomic Name Usage and I don't think it is a good idea to use it for something that people think is something entirely different.

Anyway, despite my mangling it up so badly, I think this is a very nice example, which would be nice to get worked out with proper sources and links to CoL and BHL to help explain some of our concepts (in the SKOS sense) to people.

DISCLAIMER: My CoL 2020 is hypothetical.

jgerbracht commented 4 years ago

I still like TaxonConcept for these "deep" concepts, because they are really concepts. Also, there is a community who thinks of TaxonConcept as being these 'deep' concepts. But you may be right, that it's too overloaded of a term. TC works for either ;)

Re "I dont' think they can be exchanged", I actually think the opposite, that having TC identifiers (TCIs) makes it really easy to exchange biodiversity data more accurately. I think of TCIs as being analogous to DOIs in several respects, they would need to be globally unique, there would need to be a centralized clearinghouse for them (at least at the domain level, think Avibase here), and they need to be resolvable to the original TNU. Each taxonomy (authority + version) would have a mapping of its TNU to TCI. Let's say that AMNH decides to utilize the most current Howard and Moore taxonomy and GBIF wants to standardize on Clements 2019. AMNH can send specimen data to GBIF including the sci name, authority and authority version (i.e. the TNU). GBIF then needs to navigate the relationship web as accurately as possible from the Howard and Moore TNU to the congruent Clements 2019 TNU (which may or may not be very straightforward). Alternatively, AMNH can send the specimen data with the TCI to GBIF, GBIF looks up that TNU in Clements 2019 for that TCI. Another museum is using Peters (1931-1987) Now GBIF must traverse the relationship web again following a different path or they simply look up the TNU from the TCI. Of course, this approach only works well when the TNUs are congruent, but that is the same using either approach and the more complex the mapping process, the more likely that errors will creep in, especially when every organization is trying to traverse the relationship web their own way. This all said, I fully realize that most taxonomic domains are not likely to be able to achieve the TCI model themselves. I think that COL and others could step in and help fill that gap once the standards are defined.

jar398 commented 4 years ago

On 3/17/20 5:54 AM, Christian Bölling wrote:

There are a few ways forward:

1.

    talk about these things A LOT, hours and hours, so that a
    common understanding can be established (at least among the
    people who are talking),

2.

    refer to published semiotic theories (I don't know of any
    powerful enough for the TDWG situation), or

3.

    use formal methods to constrain intended meaning (this is the
    OWL / BFO / model theory dogma - don't just list terms, but
    write out rich formal axioms that constrain their
    interpretations).

For this group none of these is viable, so we suffer.

@jar398 https://github.com/jar398 I don't mean to distract from dealing with the issue at hand but why do you think that none of the ways forward you have mentioned are viable for this group? Particularly number 3 (OWL/BFO/model theory)?

  1. Talk about it a lot - we are all busy with other things, and we would get fatigued. Talking in a group online about anything subtle generally leads to chaos and abandonment. I think this kind of development has to be done either 1-1 or in small colocated groups. Just my experience; could be wrong here.
  1. Theories - I don't think we'll have satisfaction unless we can account for things like differing interpretations of the same text, or conflicting hypotheses arising from the same facts, or understanding when some proposition is solid enough to lay down as a 'fact' or axiom, and when it needs to be flagged as a hypothesis or derivation. I have never seen an adequate framework for this - but again, I hope I am wrong; and also maybe I am wrong that we need a metatheory that's this strong, since maybe we can do with something much simpler and not get into too much trouble.

  2. Formal methods - I have a background in logic, mathematics, computer science, and philosophy of language and I consider myself to have a good understanding of model theory. I still struggle with it and worse, I struggle with trying to explain it to other people. Also note that BFO, which is pretty well known and has a reputation for rigor, is only just now (unpublished work by Alan Ruttenberg), almost 20 years since its inception, getting an adequate formal axiomatization and consistency proof. So application of model theory is not exactly a routine endeavor.

    • Note that OWL often gets used for defining vocabularies but with extremely weak axiom sets, allowing enormous latitude in interpretation. This sets up a kind of technical debt that has to be paid off later when data integration is attempted.

As to why none is viable here - well, in addition to the above, there is the stated time constraint for this project, which is quite stringent.

So I am pretty pessimistic. Sorry to rain on the parade. Will continue to try to figure out good, simple, economical advice even if I've failed so far.

I'm sorry I haven't kept up with this thread - I think this is important but there are many distractions. Will try to get back to it, especially to respond properly to Rich.

deepreef commented 4 years ago

@nielsklazenga :

There might be great value for a particular system to have these identifiers that are completely independent on the name string and the according to, and that depend only on the circumscription, as these are your real taxa, right?

Therein lies the problem: there is no such thing as a "real" taxon (at least, no way to unambiguously define it). Therefore, any IDs you mint to represent those name-less circumscriptions will permanently suffer an identity crisis. You will always have the question of "Do these very similar concept circumscriptions actually represent the 'same' (congruent) taxon concept? Or are they actually two (slightly?) different circumscriptions, that have an overlapping relationship?" This will always be a matter of subjective opinion.

So here is the real difference in approaches: TNUs are more or less factual entities that can be encapsulated in a well-defined unit, with very few examples where people might argue about whether we're dealing with two different TNUs or a single TNU. In other words, while TNUs represent an opinion (assertion) about the application of a name to a taxon concept, the individual TNUs themselves are well-defined, reasonably discrete entities that we can all agree on.

Taxon concept circumscriptions are always going to be very fuzzy/subjective entities, and therefore relationships between them will always be open to debate. This is why we need the [...]RelationshipAssertion class, to capture assertions about how Concepts relate to each other. In other words, we'll never have an objective truth that "TNU1 and TNU2 are congruent" -- we can only capture instances of "TNU1 and TNU2 are congruent, accordingTo Reference1".

Thus, if we start minting IDs of instances of a new "Taxon Concept" class to represent nameless concept circumscriptions, we'll have no way to lock down what, exactly, each of those minted IDs represents. One person might say that TNU1, TNU2, and TNU3 all map to TC8 (which is the same thing as asserting that all three TNUs represent congruent taxon concepts). But then someone else might say TNU3 is very slightly different from TNU1 and TNU2, because it also referenced a population that the other two TNUs didn't know about -- and therefore would map the topology as TNU1 and TNU2 map to TC8, but TNU3 maps to TC9, which is ever so slightly different from TC8. At that stage, the minting of identifiers to represent taxon concepts (TC8 and TC9) doesn't really solve anything -- it just adds a new layer of complexity to managing the data.

Without nameless identifiers for Taxon Concepts, we would capture the information as:

Adding in nameless identifiers for Taxon Concepts, we might instead capture the information as:

And then we would also be tempted to add:

I see a lot of additional complexity without any real gains in terms of capturing information.

@jgerbracht :

I think of TCIs as being analogous to DOIs in several respects, they would need to be globally unique, there would need to be a centralized clearinghouse for them (at least at the domain level, think Avibase here), and they need to be resolvable to the original TNU.

Yeah, exactly. I guess I just see a lot of additional complexity without any real advantage in terms of data management or granularity of information capture/exchange. What happens when Avibase has one set of TCIs, and AOU has another, and all of the other (nine?) groups that asserts taxon concept topologies for birds each has their own set of TCIs? Now we'd need to have a third layer of abstraction that maps those TCIs to each other (meta-meta assertions). And then those cross-mappings would each need an accordingTo, suggesting the need for meta-meta-meta assertions. Where does it end?

If all TCIs are resolvable to the original TNU, then why not just use those original TNUs as the anchorpoints to which all the other TNUs are mapped, without the need to mint a whole new class of thing, with its own distinct identifiers, and its own set of cross-mappings to other similar things?

@jar398 :

I'm sorry I haven't kept up with this thread - I think this is important but there are many distractions.

AMEN to that!! Part of the reason I'm spending so much time on this discussion is that it represents a MUCH-needed (and MUCH-appreciated) distraction from the real chaos that's happening in my other worlds (e.g., prepping database infrastructures that will allow our staff to stay productive while working from home, against a background of heightened data and network security concerns in the wake of a massive ransomware attack that we still haven't recovered from four months later...) This conversation is way more fun and interesting (and, honestly, useful).

jgerbracht commented 4 years ago

I find that in the world of bird taxonomies, there will be 100s of TNUs that are congruent with a single TCI. Granted, I don't think this is the norm for many other domains, but when I think of how many TNUs there are for a single taxon concept/circumscription, I begin to worry about a hard to traverse web. However, if you are thinking that maybe the 'original' TNU has a identifier that can be passed with the specimen data or observational data, that that identifier is in essence a TCI as I think of it. The other think I worry about with this approach is that only a subset of TNUs will be original TNUs and if we expect that all other congruent TNUs will be mapped only to that original TNU (as your example above), I think that will be hard to ensure via a standard. Unless I mis-understood your examples above, what I'm trying to avoid if we can is also needing TNU2 and TNU3 are congruent, accordingTo Reference1

It may not come across well in my writing, but I also find this fun, interesting and very useful, both for our standard and for shifting how I think about these things. So thanks !!!

deepreef commented 4 years ago

I find that in the world of bird taxonomies, there will be 100s of TNUs that are congruent with a single TCI. Granted, I don't think this is the norm for many other domains, but when I think of how many TNUs there are for a single taxon concept/circumscription, I begin to worry about a hard to traverse web.

I guess my question is: why does there have to be a web? What is the difference between minting a new ID to represent a new class of entity (TC), to which 100s of TNUs are mapped, vs. designating a single "original" TNU as a recognized anchorpoint for a TC, to which 100s of TNUs are mapped? If we keep it all within TNU-space, then we don't need to hammer out a definition of a TC, or establish a new class of object, or deal with yet another set of identifiers. Whatever conventions/best practices/constraints are established for minting these new TCs could simply be used to identify "original" TNUs instead.

Intuitively, structuring the [...]RelationshipAssertions as TNU-->TNU relationships appears to open the door to a near-infinite web of possible cross-mappings, but it doesn't need to be that way. The mappings will only exist as we create them, so we just need good practice to create them. Similarly, intuitively it seems like defining these abstract, name-less TC entities with their own identifiers & properties will simplify things, but in reality all it does is push the potential for an infinite number of TNU-->TNU mappings over to a potential for an infinite number of TCs, which themselves have ambiguous boundaries.

The only way I can see a pathway for TC-entities that make sense would be to define them as sets of Protonyms (i.e., a Protonym behind an accepted taxon name from a particular accordingTo, along with the set of protonyms for heterotypic synonyms asserted by the same accordingTo). But even then, while you could conceivably get performance improvements, I'm not sure there is any informatic value in creating a new ID to (by definition) represent the collection of ProtID1+ProtID2+ProtID3+ProtID4. From an information perspective, they're equivalent. In other words, while it may make sense in particular implementations to generate a look-up table with single TC ID values assigned to sets of ProtonymIDs, I don't think there is any value in exposing those TC ID values through a data exchange standard. And, of course, Protonym-sets are pretty crude way to define TCs, and would still require [...]RelationshipAssertions to cross-map.

However, if you are thinking that maybe the 'original' TNU has a identifier that can be passed with the specimen data or observational data, that that identifier is in essence a TCI as I think of it.

Yeah -- exactly. That's sort of where I'm going. The downside is that the TNU passed with specimen data could be any of the 100s of TNUs representing the same circumscription (hence setting you up for the near-infinite web of cross-mapped TNUs). That would need to be mitigated by encouraging convergence on using some sort of flagged "original" TNUs for this purpose. I see that as the lesser of evils relative to defining a new TC entity with its own set of identifiers that will itself need to be managed (etc., etc.). Any solution that makes the latter feasible, could also be applied to making the former feasible, without the need for the additional complexity of defining a new class of entity.

The other think I worry about with this approach is that only a subset of TNUs will be original TNUs and if we expect that all other congruent TNUs will be mapped only to that original TNU (as your example above), I think that will be hard to ensure via a standard.

Agreed! That is a problem. I'm just saying that I think that in the grand scheme of things, this pathway leads to fewer and more tractable problems than the other pathway (i.e., defining a mechanism for minting IDs for name-less TCs, with all the baggage that goes with it).

Unless I mis-understood your examples above, what I'm trying to avoid if we can is also needing TNU2 and TNU3 are congruent, accordingTo Reference1

Right. But tracking the fact that TNU2 and TNU3 are both congruent to TNU1 (and, hence, congruent to each other) is really no different from the task of tracking the fact that TNU1, TNU2 and TNU3 are all mapped to TC8. Except in the former case, we don't need to define and manage TC8.

It may not come across well in my writing, but I also find this fun, interesting and very useful, both for our standard and for shifting how I think about these things. So thanks !!!

LIKEWISE! (With apologies to those following this who don't find it quite as fun & interesting...)

nielsklazenga commented 4 years ago

@nielsklazenga :

There might be great value for a particular system to have these identifiers that are completely independent on the name string and the according to, and that depend only on the circumscription, as these are your real taxa, right?

Therein lies the problem: there is no such thing as a "real" taxon (at least, no way to unambiguously define it). Therefore, any IDs you mint to represent those name-less circumscriptions will permanently suffer an identity crisis. You will always have the question of "Do these very similar concept circumscriptions actually represent the 'same' (congruent) taxon concept? Or are they actually two (slightly?) different circumscriptions, that have an overlapping relationship?" This will always be a matter of subjective opinion. ...

Yes, that was irony. We are in full agreement on this.

nielsklazenga commented 4 years ago

Not a fan of the "type" approach.. Many, especially historical, but also current, TNUs do not have enough context to make Taxon Concept Relationship assertions, or decide to which more esoteric Taxon Concept they "belong" and different taxonomists may be willing to make assertions based on different amounts of information, so there will be different opinions and what the "original" TNU is.

Since all TNUs in a cluster of congruent TNUs are congruent there seems to be no point in having a "type" or "representative" or "original" TNU, as any Taxon Relationship Assertion you make for one TNU in the cluster goes for every other TNU in the cluster. It makes much more sense to me to compare Taxon Concepts with the adopted – or immediately previously adopted – Taxon Concept than with some historical Taxon Concept in a treatment that you might have never seen. There is no analogy/comparison between the "types" suggested here (Taxon Concepts/TNUs) and nomenclatural types (specimens).

The idea of a network of TNUs/Taxon Concepts and their Taxon Relationship Assertions (@jgerbracht's "hard to traverse web") actually really appeals to me. It might not be very efficient for some purposes, but it can be made more performant by (dynamically) creating a cache/index in which all mutually congruent Taxon Concepts are clustered or hanging all Taxon Concepts off a Catalogue of Life backbone (I don't actually know if that is more performant, but it would be really cool).

deepreef commented 4 years ago

Not a fan of the "type" approach.. Many, especially historical, but also current, TNUs do not have enough context to make Taxon Concept Relationship assertions, or decide to which more esoteric Taxon Concept they "belong" and different taxonomists may be willing to make assertions based on different amounts of information, so there will be different opinions and what the "original" TNU is.

I don't see this as a problem. TNUs lacking sufficient information to map to other TNUs via [...]RelationshipAssertion instances simply won't have any relationships asserted (i.e., status quo for 99.9999% of all TNUs to date). They'd still be hooked in as "potentially mapable" TNUs by virtue of the Protonym link (relative to the set of Protonyms represented in an existing cluster of congruent TNUs around an implied TC). As to "who decides the correct 'original' TNU" issue, this is no different from "who decides what the minted TC instances are" issue, so that problem is the same regardless. But the nice thing about using only TNUs as anchorpoints for [...]Relationship Assertions is that we're already minting them, and there is no requirement for some sort of authority or representative group to define the TC instances.

Since all TNUs in a cluster of congruent TNUs are congruent there seems to be no point in having a "type" or "representative" or "original" TNU, as any Taxon Relationship Assertion you make for one TNU in the cluster goes for every other TNU in the cluster.

I think the point of the "idea" for an "original" TNU for a particular TC is more about encouraging good practice. So, consider:

  1. Aus bus sec. Linneaus 1758
  2. Aus bus sec. Smith 1850
  3. Xus bus sec. Jones 1950

Each of these is a TNU that is deemed to represent a TC congruent to the other two. The chaotic path to creating [...]RelationshipAssertions among them is to assert something like: 1 is congruent to 2 1 is congruent to 3 2 is congruent to 1 2 is congruent to 3 3 is congruent to 1 3 is congruent to 2

A more tractable approach would be to acknowledge that, say #2 gives the best & most complete taxonomic description, and therefore serves as the best anchorpoint for the TC. The first person who generates [...]RA instances would then do something like this: 1 is congruent to 2 3 is congruent to 2

Then later on, when someone wants to map in these additional TNUs as congruent:

  1. Xus bus sec. Pyle 1985
  2. Xus bus sec. Warren 2000
  3. Xus bus sec. Trump 2015

They would follow convention and go with 4 is congruent to 2 5 is congruent to 2 6 is congruent to 2

Without this approach, there are 30 possible pair-wise [...]RA instances that could be generated (60 if you consider they can be represented in both directions) simply to indicate that all 6 of these TNUs represent congruent taxon concepts. Instead, following the "best practice" of "original" TNUs, we can represent the same information with 5 instances, instead of 60. Likewise, with 100 TNUs representing congruent TCs, we'd only need 99 [...]RA instances to capture them all, instead of nearly twenty thousand.

A few caveats:

Note: perhaps "original" isn't the best word to use in this context. Maybe something like "Canonical"?

But despite all the problems, I see the same problems and more in defining a new class of TC instances.

It makes much more sense to me to compare Taxon Concepts with the adopted – or immediately previously adopted – Taxon Concept than with some historical Taxon Concept in a treatment that you might have never seen. There is no analogy/comparison between the "types" suggested here (Taxon Concepts/TNUs) and nomenclatural types (specimens).

The idea of a network of TNUs/Taxon Concepts and their Taxon Relationship Assertions (@jgerbracht's "hard to traverse web") actually really appeals to me. It might not be very efficient for some purposes, but it can be made more performant by (dynamically) creating a cache/index in which all mutually congruent Taxon Concepts are clustered or hanging all Taxon Concepts off a Catalogue of Life backbone (I don't actually know if that is more performant, but it would be really cool).

I agree! And that's why I present it as a "best practice", rather than a constraint or requirement. This approach gives us the best of both worlds: that is, the ability to generate a web-network of relationships if we want to, but a pathway to avoid that in cases where it may not be helpful (e.g., birds).

The main point I'm trying to make is that creating a new class of TC instances doesn't necessarily solve any problems, but it definitely creates some.

nielsklazenga commented 4 years ago

The main point I'm trying to make is that creating a new class of TC instances doesn't necessarily solve any problems, but it definitely creates some.

Yes!

nielsklazenga commented 4 years ago

Note: perhaps "original" isn't the best word to use in this context. Maybe something like "Canonical"?

But despite all the problems, I see the same problems and more in defining a new class of TC instances.

I want neither. But, from what you say now, I think I might take the word "type" a bit too literally (or too much in the nomenclatural sense) and this is really about having a limited (optimal) set of relationship assertions from which as many as possible of the other relationships can be inferred.

jar398 commented 4 years ago

I wrote down my thoughts here (long): https://odontomachus.wordpress.com/2020/03/22/thoughts-on-taxonrelationshipassertion/

Sorry, this is in haste - I beg your patience with mistakes and other infelicities. I hope it is more helpful than aggravating.

deepreef commented 4 years ago

@jar398 :

MANY thanks for taking the time to write that. Not even slightly aggravating (not to me, at least). I think it helps clarify a number of key points that may have been assumed, but have not been stated with sufficient "explicity" (word I think I just made up)

I think the documentation for parentNameUsage, vernacularName, and preferredName all need to be clarified to emphasize that this information is according to the what the source says (NOT according to the author, who may have changed her/his mind since writing the source!). We need to be very clear that the purpose of this class is to anchor what we say to documentary evidence, and to draw a line between what the source says and how we interpret it. If we want to interpret, we will do so in sources we write, and that will lead to our own TNUs.

YES!!!! This is beautifully stated, and I completely agree!

The definition of TNU (“operationalization of a taxonomic concept”) is vague and leans too heavily on “taxonomic concept”.

I agree! I've been uneasy with that wording, for the same reasons you stated. As the person who came up with the term "TaxonNameUsage" (predecessor of TaxonomicNameUsage), I can guarantee that I did not consult Webster's definition of "usage" to ensure appropriate harmony in the usage of the word "usage".

TNU is slightly more granular that this in that TNU 1 and TNU 2, both of the same verbatim name string, might have indistinguishable usage, yet still be different TNUs because the sources are different.

A subset of TNU instances (specifically, the TNUs we're probably most interested in capturing) go beyond the simple expression of a 'verbatim name string', and include a broad suite of information pertinent to defining a taxonomic concept. However we define a "Treatment" (sensu PLAZI), a TNU includes all the content of a Treatment (Treatments are a subclass of TNUs). So even though the term "TaxonomicNameUsage" implies the usage of a TaxonomicName, it also includes the full context in which that TaxonomicNameUsage was used. I'm open to a different term, but I'm not sure we'll find one that isn't too cumbersomely long.

In any case, the relevance to us is that even if TNU1 and TNU2 use the exact same 'verbatim name string', they very likely represent (maybe slightly) different implie taxon concepts, so treating them as different entities with different identifiers is important. Related to this, the name-string is only one of several core properties of a TNU instance, and not even really among the most important of them.

He finally settles on saying it has to mean whatever the ICZN code says it means, and that it should not be used in botany.

Ha! That's awesome. The ICZN code doesn't even really deal with "taxa", so I certainly wouldn't look there for a definition! The ICZN Code is mostly about name-strings and how their linked to type specimens, and by what means they become established and are prioritized. The notion of a "taxon" only comes into play in a few specific cases. Another layer of irony is that the journal "Taxon" is focused on algae, fungi, and plants.

That is, I want ‘taxon’ to be a biological entity, not a human, administrative entity. We have ‘name’ and ‘TNU’ as good administrative entities, but when you do science you have to interpret names or TNUs as biological entities.

I agree -- and that's exactly why I think "TaxonRelationshipAssertion" is the perfect term. In my mind, each instance of TaxonRelationshipAssertion represents "the RCC-5 relationship between the biological entity implied by one TNU and the biological entity implied by another TNU, according to a specified Reference". I would NOT say that it represents "the RCC-5 relationship between one TNU and another TNU, according to a specified Reference" -- for exactly the reasons you articulate. TNUs are packages of information created by humans, and therefore do not have RCC-5 relationships with each other.

In particular, we don’t want the situation where a group for millions of years is not a taxon, and then suddenly, when an article describing it is published, it becomes a taxon. [...] By ‘group’ I don’t mean the mathematical notion of ‘set’, or Lam’s other candidate meaning of natural grouping based on characters; I mean a group that a competent taxonomist might circumscribe. I don’t know if we can, or need to be, more precise than that.

OK, so definitely nothing to attach from my perspective. However I would like to peel this apart a little. I'm 100% with you on how you define "group" above, and especially the point that we may not be able to (nor need to be) more precise. Where I get a little squishy is that you seem to suggest that by "biological entity" you mean something more than just a 'set' of organism that 'that a competent taxonomist might circumscribe'. What concerns me is the suggestion that taxa existed before and beyond the existence of taxonomists. Certainly there are many who believe that taxa exist as 'natural' biological entities that exist (and have existed for millions of years) in nature, independently of taxonomists assertions of such. But there are also plenty (and in my sense a growing number) of people who believe that organisms exist in nature, and we humans define our own circumscriptions of them for the convenience of communicating with each other. Sure, we definitely like to use inferences about phylogenetic relationships to guide us in our groupings, but from the latter point of view they are still artificial.

This is not the forum to debate that particular issue, but I think it's relevant because when we define our terms, we should use words with definitions that are defensible in the context of common practice.

What is going on is not that TNUs are groups, but that they are interpreted as designating groups.

Yes! Except I would use the word "representing" instead of "designating".

We want to be able to claim that if t1 and t2 are TNUs, then the groups that we interpret t1 and t2 to be are equivalent, or satisfy some other RCC-5 relation, etc. etc. It is not the TNUs that are equivalent or whatever, it is the groups.

Yes -- I agree (as per above). But if "groups" are what a competent taxonomist might circumscribe (which, by the way, is essentially Darwin's interpretation as well), then I think it's more than fair to say that TNUs can be used to represent those groups.

And by the way the word ‘concept’ has no place here at all.

I completely agree! I think the word "Concept" is vaguely appropriate (especially if you're in the camp that believes that taxa exist in the minds of taxonomists much more so than as entities in nature), but I think that word has probably added more confusion and obfuscation than clarity to our conversations of this stuff over the years.

The important thing here is not the asserting, which is ubiquitous in the document at hand, but the relationship between the groups.

Here is where we may differ. In my mind, the accordingTo property is just as important (maybe more so?) than the RCC-5 relationship value. That's why I think the word "Assertion" is appropriate for inclusion as part of the term. If we only were tracking [TNU1] [RCC-5] [TNU2], as a statement, then I agree that Assertion would be redundant. And I also agree that "Assertion" is not the right word for what we now refer to as TaxonomicNameUsage (it didn't take much to persuade me to abandon my original term "Assertion" for that in favor of the TNU term). But because the relationships we want to exchange through this standard are so fundamentally connected to the accordingTo part, I still like the word "Assertion" as part of the term. Having said that, I'm definitely not stubborn on this point, and will gladly stop arguing in favor of it if others think it is unnecessary or redundant.

I'll read your post again more carefully (especially the penultimate paragraph, which I want to think through a bit more). But my gut sense is that we're very close to each other in how we conceptualize this stuff. The issue with the word "Assertion" is, in my mind, not important (more an issue of style than of substance), so I'm more than happy to go along with whatever others think is best. And I also STRONGLY agree that it's worth the time to get this right (hence the massively long posts from me on this topic). And just to be clear, I have VERY little confidence that my views on this are "right" -- and indeed my views have been evolving through this discussion. I also have no strong opinion one way or the other on "Relation" vs. "Relationship" -- i'm equally happy with either term.

Thanks for everyone's patience and willingness to talk this through in such detail! I am finding it to be a very valuable discussion.

jar398 commented 4 years ago

Sorry, I seem to go on and on! You see why I try to stay out of these discussions - it is out of consideration to you all.

@deepreef: I think we don't disagree even if it appears to you that we do. It is just me being unclear. Some reactions:

even though the term "TaxonomicNameUsage" implies the usage of a TaxonomicName, it also includes the full context in which that TaxonomicNameUsage was used

Yes, we're on the same page. A usage is an overall pattern of discrete use(s). If I want to talk about a usage (pattern), I will tell what is being used (the namestring or whatever) and give you either a definition, one or more examples of use, or both, to specify the pattern - whatever is going to be most effective in that communication scenario. In this case, the examples and/or definition come from a documentary source. And we can't fully understand a source unless all of it is at hand. I would not say that the namestring or source is "part" of the usage or that the usage "includes" it, but they certainly have a role in your understanding what I'm talking about. And when I try to "identity" or "designate" a usage, I will usually have pretty good success if I just give you the namestring and the source (or an adequate reference to the source). Certainly if the namestrings differ we are talking different usages; if the sources differ we might or might not be talking about different patterns of use. We lose nothing of any value if we stipulate that we would be talking about different TaxonomicNameUsages if the sources differ (e.g. if one source is a reprint or translation of the other) - a small technical way in which a TaxonomicNameUsage is not exactly a kind of usage. So - the way I read it, this is a agreement.

you seem to suggest that by "biological entity" you mean something more than just a 'set' of organism that 'that a competent taxonomist might circumscribe'

Oh no! I certainly didn't want to suggest that. I wanted to say that we probably don't want all mathematical sets to qualify as biological entities (groups), only some of them. All of the entities/groups would be sets (not things that are "more than" sets), but as a matter of communication design, we would probably not want to allow every set to qualify as an entity/group. Lam makes a similar point, suggesting that we discourage overly liberal usage of "taxon". My weasel condition is that the group only has to potentially be circumscribed, not that it is - that removes all talk of validity, publication, etc. from the moment and lets us use it freely with biological meaning.

Almost always, without a stated circumscription, you won't know which group I'm talking about, so it might appear that the circumscription and the group are inseparable, or that the circumscription creates the group; but neither is the case.

(There is an irrelevant twist, which is that we usually allow biological group membership to change over time, just as human group membership does. New individual organisms might enter a group when they are born and leave it at or after death. This twist helps us talk about evolution, speciation, extinction, and so on. Certainly this doesn't happen in set theory. Anyhow, no matter)

What concerns me is the suggestion that taxa existed before and beyond the existence of taxonomists.

Of course the existence of abstractions is an old question in philosophy, with answers on a spectrum from platonism to realism to solipsism. I just try to take a common sense approach and talk through each case, asking what T. C. Mits would say. Suppose we agree that mosasaurs are a group. Did the group of mosasaurs exist in the Cretaceous? No sensible biologist would say that it didn't. Did they form a taxon in the Cretaceous? I don't know - maybe their description in the modern era somehow caused the group to become a taxon way back then - that would be odd but maybe it doesn't matter. Had humans and scientific communication never evolved, would the group have been a taxon? Perhaps there would have been groups, but no taxa at all, in that situation. The answer is important only to the extent it helps us understand what either of us means by the word "taxon", and how we plan to apply the word "taxon" in practice. For that purpose, it doesn't feel like an answer matters very much.

If we just disagree on our preferred usage of the word "taxon" that's fine, because I don't expect anyone in taxonomy to use it the way I personally like to use it. I won't push.

(Re existence, let me try an artificial analogy. You and I are standing in a room, and I am trying to talk to you about a circular area on the floor that I have "discovered" (there is something useful to be said about it). There are many such areas and at first you have no idea which such area I might be talking about. Then I take a marker and draw a circle on the floor, and tell you that the area I am talking about is the one inside the circle that I have drawn. Then you know what I am talking about. The circle is definitely not the same as the circular area. The area existed before I drew the circle, and before I knew anything about it; the circle did not. I only drew the circle as a communication aid. The circle may assume its own importance if we continue to discuss the history of our interaction together, the evidence for claims I am making about the area, and so on. The circle definitely has a tight relationship to the circular area. But they are different things with different properties, in particular the interval of time over which they exist.)

the accordingTo property is just as important (maybe more so?) than the RCC-5 relationship

I've been going over this in my head to try to do justice to your point. When we are having an argument about a RCC-5 relationship, whether the argument is one of how to interpret the TNU sources or a scientific argument about evolution, what we care about is not the relationship per se, which has few interesting properties, but whether the relationship holds (exists) at all. (I think this agrees with what you said.) In the case of a congruence relationship, if the two TNUs relate to the same group, the congruence relationship (whatever it is you think it's between) holds; otherwise it doesn't. The proposition or claim - or assertion, if anyone has asserted it - that the relationship holds (exists) is what is of interest. When we trot out evidence it is to support or refute the claim that it holds.

It is fo some value to talk about a claim (e.g. that a relationship exists or holds) even if the claim is not asserted. So 'assertion' may be a red herring. But having a table of 'assertions' suggests that the author of the source is making the assertion (who else would have done so?), so the table probably doesn't need a column for truth value distinguishing true assertions from false ones - that would be redundant. Similarly distinguishing TrueAssertions from FalseAssertions would not ordinarily be helpful. Having a table of 'relationships' also does not require a 'hold / does not hold' column since why would you have a row in a table for something that doesn't exist? You don't need "RelationshipThatHolds" any more than you need "SpeciesThatExists". (well again, there are exceptions, but they are exceptions.)

If you want to provide evidence for some claim (that an assertion is of a proposition that's true, or that somebody asserted something, or that a relationship holds) you'd do that the same way you'd do that for any other kind of claim. A claim usually correspond to a single cell in a table (in the context of the table as a whole of course, so that sense can be made of the cell). If the value in a column would always be redundant - 'true' or 'holds' etc. - you can elide the column. If I have an 'average adult mass' column in a table with one row per species and want to give provenance for claims about mass, I can add a 'mass claim reference' column to provide evidence for the claims that the various masses have those values. The reference itself is not strictly about the species, it is about another entity (the claim given or 'represented' in the 'mass' column) that bears an indirect relationship to the species (the table is 'denormalized', in database terminology). This is OK, you just have to document what's going on. If you feel inspired you can normalize and have a separate table to hold those claims for which evidence, provenance, etc. is going to be given, rather than trying to shoehorn all that information into the table for the things that the claims are about.

I guess I'm saying that denormalization is the key idea that lets us reduce the number of classes and tables when entities are roughly in 1-1 correspondence. We saw that with the R1/R2 scenario in my post. A property column doesn't have to be properties of the subject matter of the table, it can be properties of closely related entities. We risk leading people to think that related entities are identical entities, but good documentation can help steer them straight.

Still not sure 'asserted' adds anything. The author is making a table, one per relationship. They probably claim the relationships hold, otherwise they wouldn't put them in the table, and a relationship that doesn't hold doesn't exist and isn't really a relationship (think unicorns). You can provide evidence for the claim that the relationship holds in the table as denormalized information (information about the claim, as opposed to information about the relationship). So...

'Assertion' also sounds like someone's saying it is of a stronger flavor than other information in the source (set of data tables or whatever). It has the suggestion of being a speech act or authoritative proclamation, which it is not. It's really just a claim, just like all the other information in the source. Maybe not even a claim; could be a calculation (e.g. phylogenetic), or hypothesis, or surmise - something no one believes yet. Best not even try to say what kind of proposition it is, or what is intended in expressing it.

I'm just noodling here - please see this as an exploration, not me trying to bully anyone. I don't think the way we spell this class (the class of whatevers they are that the rows of the relationships table are 'about'), whether it contains 'Assertion' or 'Claim' or 'Relationship' etc., is the most important question we face. The documentation is much more important than the label.

Anyhow my underlying purpose is to exemplify a particular method of deliberation over naming and ontology in hopes that others will try it.

Except I would use the word "representing" instead of "designating".

I have said that a TNU is "of" a namestring or TaxonomicName (I don't care which), in the way we might say that there are many usages "of" the word "nice". The namestring, in turn, according to the TNU, is "interpreted" to be a group (perhaps by different people in different ways). I can't think of a good word for the obvious relationship between the TNU and the group. I think earlier I said a TNU "designates" a group and that is not correct. Will mull.

I am allergic to the word "representing" because technical sources seem to all use it differently and/or at odds with ordinary language, and usually without any definition or analysis. For given A and B, how do I tell whether A represents B? The word seems to require a technical definition in a way that "express", "designate", "indicate", even "meaning" and "sense" do not. I could be talked into it, given a definition; no big deal.