How to indicate which TNUs are current

nielsklazenga commented 4 years ago

I think @jgerbracht 's comment in issue #54:

...I think we also need a 'current' or 'best' TNU.

alludes to a more general issue that is not necessarily related to the "Taxon ID".

When we deliver Taxon Relationship Assertions – and we SHOULD deliver at least those that show the relationships between different versions of our taxonomies – there will be amongst the TNUs we deliver TNUs that are not accepted by the meta authority at the time of delivery.

Do we need a way to indicate which TNUs are current? Or is the information already there? Or do we not need to deliver all TNUs that are involved in Taxon Relationship Assertions, so that all TNUs that are delivered are current?

I think I like the last option, but I only realised that when I was already halfway through writing up this issue. The question has come up anyway.

Like issue #54, this is mostly "off in the realm of application thinking", but that does not mean that we cannot talk about it. It is a use case for the standard and needs to be addressed, otherwise it will keep coming back and might prevent people from using the standard, or part of the standard (like Taxon Relationship Assertion).

jliljeblad commented 4 years ago

A user might very well be looking up an unaccepted TNU, so those need also be delivered. Basically, I think this status is a property of a congruent set of TNUs (approx. taxon concept) which is either Acceptedor Unacceptedby the meta-authority. However, whenever possible you also want to provide a date and a reason for this status. By doing so and logging this, you can track the history of taxonomical changes.

The simplest cases are when the reason is due to the usage being Obsolete or because it is Undefined (what we usually call a nomen dubium). But there are a number of cases when a concept has been replaced by another one. I list the ones I can think of here:

One or more concepts merged into a new concept, otherwise unspecified
One or more concepts merged into an extended concept
A concept replaced by a subjectively identical concept (taxonomic duplicate)
A concept replaced by an objectively identical concept (database duplicate)
A concept is split into two or more concepts
A new concept is separated from an otherwise unchanged, old concept

The last one is for when you want to convey information like in the following scenario: We start with a species occurring in a larger region. A taxonomic revision identfies that the population on a remote island doesn't fit the description of this species. The original species concept therefore remains unchanged while the status of the island population is changed to the status of a new species (new concept is separated to show that these individuals formerly were treated as included in the original species).

I'm not sure how this translates to RCC5 relationships between TNU usages exactly. Haven't come that far in my thinking yet.

But basically, this is how we handle concepts in the Swedish taxonomic database, Dyntaxa.

nielsklazenga commented 4 years ago

Thanks @jliljeblad . The taxonomicStatus that we currently have is the taxonomic status according to the author of the TNU at the time the TNU was published (so the accordingTo), but the status you describe is what I was looking for.

When I said that we might not need to deliver all TNUs that participate in Taxon Relationship Assertions and might only need to deliver current TNUs, I didn't mean that those other TNUs should not be available, but that the consumer might already have them from a previous delivery, or, if not, can go and get them from the provider.

jgerbracht commented 4 years ago

To clarify how we are treating these concepts a little differently, I can use your specific example above. Re "The original species concept therefore remains unchanged while the status of the island population is changed to the status of a new species (new concept is separated to show that these individuals formerly were treated as included in the original species)."

I actually see 3 concepts here, the original concept, the new mainland concept and the new island concept. Both of the new concepts are considered ' a part' of the original concept which may be 'retired' from active use though may also have specimens/recordings/observations still tied to that original concept somewhere out in the museum/library/web world.

jliljeblad commented 4 years ago

I'll see if I can illustrate this better. The interpretation you are describing, involving 3 concepts, is what I call a split above: A concept is split into two or more concepts

What I am trying to do, by instead calling this a separation, is conveying the information that the original concept is unchanged. The change is NOT in the reevaluation of the concept (based on, say, morphology or DNA) but in the realization that the island population doesn't fit this concept. If we base our concepts solely on geographical information, I would agree with the 3 concept conclusion. But I would argue that if we have a character based concept definition, we could just realize that the island population has been wrongly placed in the larger species. Now, we could just plain add in the newly described species, but by adding in a status that it originated by being separated from another species we can destinguish this kind of new (known populations being reevaluated) from when a completely unknown population is discovered and described as new. Of course, a split is yet another case of when a known population is being described as new. So there are different kinds of new species and I believe we can benefit by discerning between them.

This is not really the important message I was trying to deliver in my previous post, but in another one I will try to elaborate on that, adding in what kind of relationships we would need to describe these events.

jgerbracht commented 4 years ago

I see, and I think trying to distinguish the 'history' of how a species can to be realized is important. I was simply thinking that the original concept has actually changed, because the set of individuals and populations which make up that concept has suddenly changed by the removal of an entire population. I wonder if we're confusing changes in a taxonomy revision with changes in an underlying concept.

deepreef commented 4 years ago

Yeah, these sorts of ambiguities are exactly why it's been so hard to pin down the notion of a self-contained "Taxonomic Concept" as a class/entity of its own, independent of what people have asserted about it (via TNUs). This specific point about "confusing changes in a taxonomy revision with changes in an underlying concept" is rampant among practicing taxonomists. As I think @ghwhitbread can likely attest, even when you educate taxonomists on the need to be more explicit about what they really mean when throwing taxon names around (i.e., anchoring them to specific concepts or TNUs), the taxonomists, on the whole, generally don't make these distinctions explicit in their published (and unpublished) assertions (or in their databases). Hell, even I don't do it in my own taxonomic works (but perhaps I will be better about it going forward).

jgerbracht commented 4 years ago

@deepreef. Agree 100% and my entire motivation for this group is to see if we can change that. Separating TNUs from TC identifiers has been key for eBird and other related projects to transition observational data from one taxonomic revision to another accurately and I hope that the standard we produce will 'encourage' exactly what you are describing.

Also, To get back on the topic of this thread as I may have strayed some, my comment in issue 54 was more around how an instance of cTNU is identified, i.e. if I have a cTNU Id and I want to know what set of individual animals/plants/etc. a cTNU applies to, how do I figure that out. Do I navigate the relationship trees and review congruent TNUs or is there something we can include within a cTNU which serves that same purpose. What information best describes what that cTNU represents in the real world. That lead me to thinking about a 'best' TNU. That TNU would need to change over time for a given cTNU, as taxon ranges change, as we learn more id characteristics, etc.

deepreef commented 4 years ago

On the first paragraph: Cool! And Me Too! (except for animals in general, and fish in particular).

On the second paragraph: In our implementation, we navigate to the set of heterotypic synonyms included within the aTNU instance [see the post I'm about to make over at #54]. This seems to work effectively at query-time, so we haven't needed to Cache anything. When I have more time I'll generate some documentation on the actual implementation (SQL code and diagrams, etc.). It may be a challenge to navigate to the next step (congruent TNUs derived from TaxonRelationshipAssertion instances) -- I haven't written those scripts yet.

However, my gut tells me that (part of) the reason we need/want a TaxonRelationshipAssertion Class and associated instances is to perform this function. As such, there may not be a need to change the cTNU over time (unless it was originally misapplied some how).

I'm adding a hypothetical example of how that would work in the post I'm about to make on #54.

jgerbracht commented 4 years ago

Could be that the aTNU synonyms will do what's needed. I'm thinking of a use case as an example

AOS (American Ornithological Society) manages two bird checklists by committee, and I'll focus on the North America one. Let's say the committee is putting together next years updates and they are splitting the American Robin into American Robin and San Lucas Robin. The committee should assign the appropriate cTNUs to the new American Robin and to the new San Lucas Robin. How do they do that efficiently. (let's assume we have a TC or cTNU repository which already has all three of the cTNUs defined) Mr Smith, the robin expert needs to be able to search the repository for cTNUs related to the original American Robin cTNU. Review the related cTNUs (via relationships, names, etc) which somehow must include enough information for them to accurately assign the correct cTNUs to the new American Robin and the San Lucas Robin.

Option 1 is to have sufficient circumscription information stored within a cTNU to allow Mr Smith to make an accurate assignment. Option 2 is to have a 'best' TNU assigned to the cTNU (by the managers of the cTNU repo) and Mr Smith reviews the 'best' TNU details to make that accurate assignment. Option 3 is to return an 'array' or set of TNUs which are congruent to each cTNU and Mr Smith reviews whichever TNUs he/she feels is appropriate and makes the assignment based on these TNUs.

Do you think Option 3 can be done efficiently with TaxonRelationshipAssertions? And one thing I'm not completely clear on, if each TNU has a cTNU under it, are taxonRelationshipAssertions even needed for TNUs that are congruent.

nielsklazenga commented 4 years ago

The issue was really about how we can distinguish between TNUs that are accepted by the metadata provider or meta authority and those that are not or no longer accepted – and not about when a taxon concept has changed and when it has not, but I don't mind where this is going...

Regarding...

The original species concept therefore remains unchanged while the status of the island population is changed to the status of a new species (new concept is separated to show that these individuals formerly were treated as included in the original species).

... though, if an original concept contains an island population and the current concept does not include that population, how can you say that the concept has not changed?

This seems to more about speciation – splitting vs branching off – or species concepts – cladistic vs phylogenetic or evolutionary – than data exchange. While to the essentialist mind we might still be talking of the same species, in the data sense we are surely talking about different "concepts".

I thought the discussion about usageless taxon concepts and canonical TNUs was to deal with purely nomenclatural changes, e.g. when a species is put in a different genus and the name changes, while the circumscription remains the same. If now we are going to give the same ID to usages with clearly different circumscriptions, it indicates to me that the holy grail of the "stable" taxon ID is really more like the emperor's new clothes.

This seems to me something that we could discuss with the Phylogenetics Interest Group.

deepreef commented 4 years ago

@nielsklazenga :

I thought the discussion about usageless taxon concepts and canonical TNUs was to deal with purely nomenclatural changes, e.g. when a species is put in a different genus and the name changes, while the circumscription remains the same. If now we are going to give the same ID to usages with clearly different circumscriptions, it indicates to me that the holy grail of the "stable" taxon ID is really more like the emperor's new clothes.

The problem is (and always has been) that we don't have, and have been unable to create, a workable definition of "Taxon Concept" such that we more or less know whether two asserted concepts are the same or different. In the case of "including the island population" vs. "excluding", it depends on the circumstances. I would be sympathetic to the idea that the concept changed (with vs. without the island population) if the island population was widely understood to be different in some ways, and the lumper view is that it is included in the same concept as the mainland population, whereas the splitter view is that it warrants distinct recognition.

But imagine if the original concept view was defined in terms of character states, and the island population was mistakenly thought to have those diagnostic character states, so were included in the original concept. Then further investigation reveals that the island population in fact lacks those diagnostic character states. In that case, the original concept remains the same, but the original inclusion of the island population was based on incorrect information.

I think this just goes to show we still don't have a good handle on this stuff yet, so we should focus the standards discussion on getting the elements of TNUs and TaxonRelationshipAssertions (TRAs?) nailed down before we start worrying about how to define an accepted TNU vs. a non-accepted TNU.

We can do a LOT simply by providing a mechanism for data providers to share robust factual TNUs without adding their own opinions about which they accept and which they do not. Once we get a system of reliable/robust TNU exchange, and "TRA" RCC5 mappings, then I think we'll be in a better position to figure out how best to standardize the way that MetaAuthorities share their assertions about accepted vs. non-accepted TNUs (and canonical TNUs and the like).

Again -- don't get me wrong -- I love these recent discussions. But while they definitely have changed my views on how to implement MetaAuthority schemes, I'm not sure that these conversations have altered my perception of the properties or definitions of the TNUs (or TRAs) themselves.

nielsklazenga commented 4 years ago

The problem is (and always has been) that we don't have, and have been unable to create, a workable definition of "Taxon Concept" such that we more or less know whether two asserted concepts are the same or different.

Is not that exactly why we use TNUs instead of Taxon Concepts with IDs? In terms of TRAs one would say that the original TNU includes – not is congruent to – the current TNU (that does not contain the island population).

deepreef commented 4 years ago

Is not that exactly why we use TNUs instead of Taxon Concepts with IDs? In terms of TRAs one would say that the original TNU includes – not is congruent to – the current TNU (that does not contain the island population).

Well... I'm not so sure. The TNU itself contains all the properties we've included in the draft standard. But the concept/circumscription boundaries are almost always only implied by the content of the TNU itself (at least in terms of granularities more precise than type specimens). That's why we need TRAs -- to allow people to make judgement calls by experts as to whether or not two TNUs imply congruent or non-congruent taxon concepts.

Intuitively, I would say that one TNU that explicitly refers to the island population as being included and one TNU that explicitly refers to it not being included are most likely to represent non-congruent circumscriptions. But imagine this scenario:

Reference1 includes northern mainland population and island population within TNU1, and regards southern mainland population as distinct, represented by TNU2.
Reference2 includes southern mainland population and island population within TNU3, and regards northern mainland population as distinct, represented by TNU4.
Authors of Reference1 realize they made a mistake when they thought the island population had a red spot (diagnostic of the northern mainland population), until Reference2 demonstrated that the island population actually has a blue spot (which Reference1 and Reference2 both regarded as diagnostic for the southern mainland population).

I can see legitimate philosophical and practical reasons why you would argue that the Circumscriptions for northern vs. southern mainland populations in Ref1 and Ref2 are congruent (i.e., TNU1≅TNU4; TNU2≅TNU3) based on diagnostic characters; and I can also see legitimate philosophical and practical reasons why you would argue that the Circumscriptions for northern vs. southern mainland populations in Ref1 and Ref2 are NOT congruent (i.e., TNU1∩TNU4; TNU3∩TNU2) based on inclusion of island population.

nielsklazenga commented 4 years ago

@deepreef, it seems to me that you are disagreeing, while at the same time making my point. We had the Taxon Relationship Assertions before we replaced Taxon Concept with Taxon Name Usage and I am arguing that we should use them, rather than put IDs on some esoteric thing of which we do not really know (or agree) what it is.

@jliljeblad 's example specifically said that the island population had been split off as a separate species, so I would say it is pretty explicit that the original and current concept are not the same. I would also say that because you cannot always be certain if two TNUs are congruent or not is all the more reason not to put an ID on something that they purportedly share. The TRAs are exactly what they say they are, assertions, not facts, so they are much better suited to dealing with this uncertainty than IDs are.

I think we are in full agreement that we should first look at the TNUs and the TRAs and only after then see if we still need those IDs (but that is issue #54).

nielsklazenga commented 4 years ago

The kind of currency I was alluding to when I opened the issue, can be illustrated as follows:

TNUs

Acacia dealbata Link sec. Ross 2000
Acacia aff. dealbata (Monaro) sec. Ross 2000
Acacia dealbata Link sec. VicFlora 2020
Acacia dealbata subsp. dealbata sec. VicFlora 2020
Acacia dealbata subsp. subalpina Tindale & Kodela sec. VicFlora 2020

Taxon Relationship Assertions

Acacia dealbata Link sec. VicFlora 2020 includes Acacia dealbata Link sec. Ross 2000
Acacia dealbata Link sec. VicFlora 2020 includes Acacia aff. dealbata (Monaro) sec. Ross 2000
Acacia dealbata subsp. dealbata sec. VicFlora 2020 is congruent to Acacia dealbata Link sec. Ross 2000
Acacia dealbata subsp. subalpina Tindale & Kodela sec. VicFlora 2020 is congruent to Acacia aff. dealbata (Monaro) sec. Ross 2000

Ross 2000 is ed. 6 of the Census of the Vascular Plants of Victoria, the precursor to VicFlora. Acacia dealbata Link sec. Ross 2000 and Acacia aff. dealbata (Monaro) sec. Ross 2000 are no longer current, but they participate in TRAs with current TNUs. If we list them among the TNUs, how would we indicate which of the listed TNUs are current and which ones are not? Do we need to indicate it, or is the information easy to obtain from the TRAs? Or can we just not deliver the non-current TNUs (because the consumer already has them, or can go get them)?

I was hoping to get something else done today, but I think I might go on my (push-)bike ride and be back in time for our next meeting).

deepreef commented 4 years ago

@nielsklazenga : OK, you're right. Sorry -- I was addressing it in general terms, and you were talking about a specific example. It seems we do agree on all points on the island population thing.

With regard to the second post above; just having the raw TNU and TRA instances would be a massive step forward. I guess I'm a little fuzzy on how you distinguish "current" from "accepted". "Current" could mean the most chronologically recent TNU by a given set of authors. Or it could mean the taxonomic perspective that most people currently accept. Or the most recent in a particular pedigree of published works (e.g., Ross 2000-->VicFlora 2020).

To be "accepted", you need a MetaAuthority assertion (i.e., who accepts that a treatment in VicFlora 2020 supersedes a treatment in Ross 2000; and when did they accept it?) But I'm not clear on whether "current" is something that requires an assertion, or whether it's something that can/should be algorithmically derived from the facts (without additional interpretation or assertion).

nielsklazenga commented 4 years ago

@deepreef perhaps I should have made more clear that Ross 2000 and VicFlora 2016 onwards are essentially different versions of the same checklist, the Census of Vascular Plants of Victoria. VicFlora is essentially the Census with descriptions (and other treatment stuff). Eventually this will all be in the NSL, where they will be different versions of a tree and the TNUs are nodes on the tree, so the two nodes from the 2000 version of the tree are not on the 2020 version of the same tree.

So in your terms, I guess Ross 2000 and VicFlora 2020 are the same meta-authority agent-wise, the National Herbarium of Victoria.

To be "accepted", you need a MetaAuthority assertion (i.e., who accepts that a treatment in VicFlora 2020 supersedes a treatment in Ross 2000; and when did they accept it?)

I think this is exactly what I am looking for, but how do we deliver these assertions in a Darwin Core Archive or Tabular Data Package (or any serialisation really)?

nielsklazenga commented 4 years ago

But I'm not clear on whether "current" is something that requires an assertion, or whether it's something that can/should be algorithmically derived from the facts (without additional interpretation or assertion).

Me either.

deepreef commented 4 years ago

perhaps I should have made more clear that Ross 2000 and VicFlora 2016 onwards are essentially different versions of the same checklist

Yeah, that part was clear from your description, and it's what I meant when I referred to "a particular pedigree of published works (e.g., Ross 2000-->VicFlora 2020)".

Whether or not they are MetaAuthorities is another question (yet to be resolved), but I interpreted them as "References" (published or not), and hence the source of TNUs. My point, I guess, is that the relationship between a particular TNU in one vs. the other is no more meaningful than the relationship between two TNUs in two completely unrelated References. Except, perhaps, as additional information to assist in defining associated TRAs with possibly greater confidence. Thus, I guess I don't know whether "current" has any more meaning in the context of these two particular sources of TNUs, than in any other two sources of TNUs.

jgerbracht commented 4 years ago

I don't have the TNU model in front of me so I'm going a little from memory. Should 'current' really be at the taxonomy level, i.e. Authority/Version and not the TNU level, though I suppose a TNU could have a 'curren't value inherited from the authority. If there are several authorities, each will have a 'current' TNU for each taxon covered by the latest taxonomy version from that authority. However, this falls down when the TNU is a stand-alone publication and not part of an over-arching taxonomy. Not sure what the best approach is here.

nielsklazenga commented 4 years ago

Thanks @jgerbracht , I think that is where we all are pretty much.

In the later meeting yesterday, we decided that we would table/close all open issues and pick them up later.

tdwg / tnc

How to indicate which TNUs are current #55