Closed nielsklazenga closed 4 years ago
Hi Niels, I think that about captures the scope of the issue. We'd all love an ID that represents the "taxon", and we've tried to conceive of how that might actually work (devil in the details) across decades of TDWG meetings, NOMINA meetings, workhops, email discussions and the like. But there are two fundamental problems: 1) We have so far been unsuccessful in defining what a "taxon" is (to a degree that is useful for informatics), and when two references to the "same" taxon really represent the "same" taxon, or if they represent (subtly?) different taxa. 2) There appears to be little or no evidence that "taxa" as such actually exist outside the minds of taxonomists. And if they don't really exist as entities in nature, we're going to have an even more difficult time coming up with a functional/practical definition.
I don't want to dive (yet again) too deeply back into the philosophical debates, but looking at your three implementation alternatives, I would comment that option 3 is not really on the "table", so to speak, because the standard isn't about "tables". I know we always think of terms and classes as fields and tables, and in practical terms they usually are, but as you note option 3 really implies that canonicalTNU
is a separate Class -- which runs counter to your premise (and I agree with the premise).
To avoid having to sift through the long exchanges in #48, it might be better to summarize my own views here:
As for option 1 vs. option 2, both really ought to have an equivalent of an accordingTo
attached to them. We handle this through the "MetaAuthority" concept (as described in detail in the Taxonomer data model), whereby each of potentially many MetaAuthorities designates a single TNU as effectively the "canonical" TNU for a taxon concept (slightly more involved than that, but the same function is achieved). We don't need a new Class for MetaAuthority, because it is either a subclass of Agent (simpler approach) or a subclass of Reference (more robust but complex approach). However, we might want a new Class for what we call "AcceptedTaxonNameUsage", but which might be better called "CanonicalTaxonNameUsage", which basically consists of entities representing a unique pairing of metaAuthorityID with tnuID, with the understanding that only one tnuID per protonymID per metaAuthorityID is allowed (i.e., the same metaAuthorityID cannot adopt more than one tnuID anchored to a given protonymID).
So I would replace your option 3 with something like:
canonicalTNU
.The terms associated with such a class would be something like:
canonicalTaxonNameUsageID
[unique identifier for the cTNU instance itself]metaAuthorityID
[indication of the MetaAuthority asserting the canonicalTNU
]acceptedTaxonNameUsageID
[indication of the canonicalTNU
itself]That's all you need if MetaAuthority is a subclass of Reference. But if MetaAuthority is a subclass of Agent, then you'd want to add some sort of date-stamp term to the above list.
I apologize for the long post (as is typical for me), but I suspect even this much text glosses over some important explanatory details.
I like it...
I think it is important for this group to produce a framework where we can at critical times, and relative ways, talk about a model(s)-to-mimic(s) relationship between (1) entities that authors in the scientific community and literature pragmatically take, at a given time, to "reflect the causal structure of the natural world" (such as these authors, sorry but not really: https://www.mapress.com/zootaxa/2008/f/zt01671p031.pdf) and (2) our typically more deflationary, information science centered labeling and connecting of human-made theories. One may call that pragmatism, or (hat tip to Beckett Sterner) "obtaining a crucial social affordance for coordinating our actions" with those of ecological, evolutionary, conservation communities, etc.
Same for "sameness", I'd think. Take two usages of one scientific name (say, Araneae), where each entails only one of two purported synapomorphies (spinnerets; pedipalps enacting sperm transfer) that reliably have congruent referential extensions to organismal groups in nature. Pragmatically, author teams claiming credit for having discovered this or that group-making trait, will need to claim that credit also through this TNC product. I am not saying they cannot; just aiming for a little counter balance to make sure we can meet users where they are or need to be, jump back into this realm, bring them in as needed, and so on.
Even if (hat tip to @jar398) the difference between model(s) and mimic(s) is something like a fluctuating matter of degree depending on the context, the social affordance of having this relationship in place, is still an important pragmatic gain for the TNC.
Thanks, @nfranz -- thoughtful (and thought-provoking), as always!
Thanks for the shout-out to the 5 new Chromis paper, but (sorry, but not really... :-) ) I was unable to locate anything in that paper that indicates these authors' intention was to "reflect the causal structure of the natural world". Rather, I'm fairly certain that the intention of the authors was to offer up a set of ICZN-Code-compliant labels to represent subjectively (and mostly inferred) circumscribed sets of organisms in nature (living, recently dead, and yet to be born), which the authors felt were (more or less) consistent with patterns of (relatively recent) historical labeling of circumscribed sets of organisms in seemingly similar groups of organisms.
That pedantic distinction notwithstanding, I agree with you that many taxonomic authors do indeed believe they are asserting names and classifications to reflect the causal structure of the natural world (which I interpret as efforts to define [semi-]objective evolutionarily-derived units in nature through Linnean-style nomenclature and classification), so I think your overarching point (as I understand it) is valid. I guess my counter-point is that authors' intentions are rarely explicitly asserted within such papers (I happen to have insider information on the example you gave, as the paper itself, like the vast majority of taxonomic papers, fails to explicitly make this distinction), so divining those intentions represents part of the problem in figuring out what a "taxon" is (or should be regarded to be, at least informatically). As such, while serving as wonderful fodder for philosophical debates, these topics do not lend themselves effectively to structured informatic models and standards. At least not yet.
As for the "sameness" issue you outline, I don't expect that this TNC product will come to represent some sort of "credit" scheme that two competing taxonomic camps will fight over -- especially if the selection of the canonicalTNU
is relegated to a MetaAuthority (e.g., ITIS/CoL, WoRMS editors, etc.). Each MetaAuthority will need its own mechanism for determining which of two competing and presumably congruent implied circumscriptions proxied by two different different published TNU/treatment instances is the one that wins the prestige of being deemed "canonical". If authors of the non-selected alternative are unhappy with the selection, that's a beef between them and the MetaAuthority -- not a failure of the standard.
I do see one issue (which may be exactly your point, now that I think a bit more carefully about it): the way this sort of thing is currently done is that some MetaAuthorities (e.g., CoL) usually only assert an abstract taxonomic view, without pinpointing a particular publication that defined it. As such, both competing Araneae camps can claim equal credit when CoL concurs with their shared taxonomic view. But if CoL is forced/encouraged to make that Sophie's Choice (which, by extension, implies a preference for one synapomorphy over the other, rather than embracing both simultaneously), ugliness may ensue, This is a fair point, but my gut says that the informatic value of the "taxon-by-proxy" scheme we're discussing for TNUs is sufficiently large that it outweighs the potential for taxonomic social unrest. Also, I suspect in many cases there will be a semi-objective bias towards chronology of publication as the arbiter of two otherwise equally meritorious candidates for the prestige of being canonicalTNU
.
I think the best metaphor for the notion of an explicit canonicalTNU
(as previously discussed) is that of a type specimen of a new species. In many cases, the selection of the holotype is somewhat arbitrary, and as a holotype can only be housed in one Museum, there is potential for conflict when two different but effectively congruent specimens owned by two different Museums are both reasonable options for designation as the holotype. Somehow that analagous circumstance hasn't led to (too much?) conflict. Perhaps the consolation (and the analog to Paratypes) is that the same MetaAuthority can also mint a set of TaxonRelationshipAssertion
(literally for lack of a better term) instances representing the congruent relationships.
Incidentally, WoRMS is already doing this -- sort of -- via their "Taxonomic citation".
On a side note, and I'm not sure if this represents a point of agreement, or disagreement, or is non sequitur, my understanding of TNUs as proxies for implied taxa (≈ taxon concepts) is that they are more representative of the resultant implied set of circumscribed organisms (living, recently dead, and yet to be born) , than they are of the methods or mechanisms used by the authors of the TNU to define the boundaries of those circumscriptions. I strongly support an informatic structure/standard that captures those methods and mechanisms and uses them as the basis for minting instances of TaxonRelationshipAssertion
instances, and I think you are way ahead of everyone else in exploring (and understanding) that space.
If I failed to address your points amid the copious text above, I sincerely apologize! If you can clarify it a bit more, I will do my best to concede or debate (ideally much more concisely than I've done here) accordingly.
I am not sure if I understand the arguments above and maybe it is too soon to let the cat out of the bag, but the reason I like @deepreef 's solution above is that it does not change anything to the TNU. The Canonical TNU, as defined above, is a 'congruent' Taxon Relationship Assertion with some syntactic sugar, but which gives us an ID that can be used for a group of congruent
TNUs, independent of the name that is currently in use by the respective metaAuthority
.
If you find this all a bit yucky (like I do) and prefer not to have the Canonical TNUs, you can just translate a Canonical TNU statement into a 'congruent' Taxon Relationship Assertion. If, on the other hand, you do have a need for IDs for groups of congruent TNUs and get provided with Taxon Relationship Assertions (or Canonical TNU statements that use a different TNU as the canonical one than you do), you can turn them into Canonical TNU statements using the rules you apply to decide which of a group of congruent TNUs is the canonical one.
First off, I sincerely apologize for my long hiatus from these discussions.
I think I follow most of what has been discussed in this thread and I also agree with Richard and Nico and think that we are getting close with the Option 3. A couple points of clarification so I understand this. @deepreef mentions both a canonicalTaxonNameUsageID and an
acceptedTaxonNameUsageID. I'm not sure exactly what the acceptedTaxonNameUsageID is in the model? To put this into different terms to validate I'm on the same page, the cTNU is an identifier that links to a single TNU by metaAuthority and aTNUID. The cTNU has no circumscription information, that information is in the aTNU. And the aTNU is 'linked' to all the other 'congruent' TNUs via relationship assertions. In this case the cTNU is analogous to what I think of as a TaxonomicConcept.
There will certainly be some devil in the details about putting services together around this, but I think this is a workable solution. I would add one more term. If the aTNU is the 'original' TNU for this cTNU, then I think we also need a 'current' or 'best' TNU. Since the circumscription of a taxonomic concept changes over time, as population ranges change, the community will need to be able to easily view a current circumscription for a cTNU. I envision the need for a 'current' TNU according to the MetaAuthority, unless we expect the single aTNU to change over time (which seems like a a bad idea to me).
Let's say we have an observation of a bird, or we have a flat skln specimen in a museum drawer, or we have a sound recording of an individual calling. What do you @deepreef @nfranz see as the taxonomic identifier tied to each of these in the respective databases, assuming what we are designing here is broadly adopted. The canonicalTaxonNameUsageID? the acceptedTaxonNameUsageID? or any appropriate TNU id? or a mixture depending on how each database implements TNUs?
This issue is definitely off in the realm of application thinking.
As one with an interest in a “meta-Authority” that delivers a versioned, consensus taxonomy around “stable” taxon identifiers, using a TNU architecture, I can say that these descriptions/options come pretty close to describing alternative ways of building another.
But how does a meta-Authority deliver these data? As arrangements of TNU sec. meta-Authority (date), hopefully using unique and resolvable identifiers, and incorporating relationships - synonymies; assertions; congruence maps, etc - with/to external TNU.
Delivery of a stable “taxon” is an issue for the meta-Authority, not the TDWG standard.
Though, a definition of “stable” might be useful here.
I think the solution is to embrace the notion that, when it comes to deciding whether a certain TNU merits this versus that accommodation relative to existing TNUs, author (or authority) intentions matter, and just are part of this particular kind of science of systematics that we are modeling.
Making explicit one's own intentions in coining a new or identifying occurrences to an existing TNU, should be an explicit allowance if not a recommendation of the TNC product. Not prejudging one way or the other, but yes providing language that allows for trained judgment to be a respectable and essential part of implementations of "sameness", "stability", etc.
@jgerbracht
@deepreef mentions both a canonicalTaxonNameUsageID and an acceptedTaxonNameUsageID. I'm not sure exactly what the acceptedTaxonNameUsageID is in the model?
SORRY! My bad -- I was sloppy in reviewing that post. It's confusing because I originally wrote that post using the term acceptedTaxonNameUsageID
because that's the field name we use in our implementation. But I later realized that the term canonicalTaxonNameUsageID
was probably better (especially in this context), so I updated it. However, I retained the term acceptedTaxonNameUsageID
as a term within the proposed(?) new Class as a pointer to the TNU that the MetaAuthority deemed as being the accepted status of a taxon/name. However, now that I think about it, there is a potential "wrinkle" with how I framed this. I'll explain this in a separate post.
To put this into different terms to validate I'm on the same page, the cTNU is an identifier that links to a single TNU by metaAuthority and aTNUID. The cTNU has no circumscription information, that information is in the aTNU. And the aTNU is 'linked' to all the other 'congruent' TNUs via relationship assertions.
YES!! Exactly! That states it WAY better than I did -- thank you!
In this case the cTNU is analogous to what I think of as a TaxonomicConcept.
Yes, I think that's how I would also characterize it (I'm hedging only because the more I think about what a "TaxonomicConcept" is, the less confident I am in my own understanding of it).
There will certainly be some devil in the details about putting services together around this,
Yes! See my next post.
I would add one more term. If the aTNU is the 'original' TNU for this cTNU, then I think we also need a 'current' or 'best' TNU.
Ok, this is analogous to, but not exactly the same as the problem I will address in the next post. I don't think the aTNU is necessarily the 'original' (chronologically first?) TNU that captures the circumscription. In my mind, it closer to the 'current' or 'best' TNU. Once could derive a chronologically 'original' TNU from non-original but 'best' aTNU separately, so I don't see a need to capture an 'original' TNU connected to each cTNU instance.
Since the circumscription of a taxonomic concept changes over time, as population ranges change, the community will need to be able to easily view a current circumscription for a cTNU. I envision the need for a 'current' TNU according to the MetaAuthority, unless we expect the single aTNU to change over time (which seems like a a bad idea to me).
Right -- and this gets at that point I made about whether MetaAuthority is a subclass of Agent or a subclass of Reference. In my mind, the key/fundamental difference between an Agent instance and a Reference instance is that the latter includes a date (as well as the potential for a set of Agents, rather than a single one).
What I was trying to get at is that if MetaAuthority is a subclass of Agent (e.g. "Catalog of Life"), then we'll want to have a date-stamp term (or terms) as part of the cTNU instance, so that we can track how CoL has changed its interpretation of 'best' TNU over time. However, if MetaAuthority is a subclass of Reference (e.g., "2019 Edition of the Catalog of Life"), then the date part is inherited from the Reference. For a number of technical reasons, I think the MetaAuthority-As-Subclass-of-Reference approach is better and more robust.
Let's say we have an observation of a bird, or we have a flat skln specimen in a museum drawer, or we have a sound recording of an individual calling. What do you @deepreef @nfranz see as the taxonomic identifier tied to each of these in the respective databases, assuming what we are designing here is broadly adopted. The canonicalTaxonNameUsageID? the acceptedTaxonNameUsageID? or any appropriate TNU id? or a mixture depending on how each database implements TNUs?
Here's how we handle it. For starters, all specimens (and other instances of dwc:MaterialSample), dwc:Occurrence instances, sound recordings, videos (which we aggregate as instances 'Evidence', ≈ Darwin-SW:Token), etc. of organisms all anchor back to an instance of dwc:Organism. Each instance of dwc:Organism is represented by one or more instances of dwc:Identification. And each dwc:Identification instance effectively serves to join together one instance of dwc:Organism with one TNU. So the direct answer to your question is that the taxonomic identifier to which all other biodiversity data is directly linked to is "any appropriate TNU id". This models what often happens in reality. Organism identifications are often directly derived from taxonomic keys, field guides, in-house identification tools, taxonomic literature, etc., all of which are hosts of TNUs. Even when the person making the identification "just knows" it (as is very often the case), such a person can usually also "just know" that whatever circumscription in in his/her head at the time is functionally congruent to some published (or otherwise documented) TNU. And even when not, there's nothing stopping the generation of a new TNU anchored to "Pers. Comm. Joe Taxonomist today's date".
@ghwhitbread : I agree with everything you said! I don't have a single answer to your question(s), but I do have in mind several potentially plausible answers. but I've rambled on enough here, and I want to address the "wrinkle" in the MetaAuthority scheme.
@nfranz
I think the solution is to embrace the notion that, when it comes to deciding whether a certain TNU merits this versus that accommodation relative to existing TNUs, author (or authority) intentions matter, and just are part of this particular kind of science of systematics that we are modeling.
I wholeheartedly agree! But without a tradition within taxonomic practice of explicitly stating intentions in a standardized/structured way, it requires more layers of "accordingTo" be applied (i.e., third-party interpretations). That was my understanding for the foundation of "TaxonRelationshipAssertion".
Making explicit one's own intentions in coining a new or identifying occurrences to an existing TNU, should be an explicit allowance if not a recommendation of the TNC product. Not prejudging one way or the other, but yes providing language that allows for trained judgment to be a respectable and essential part of implementations of "sameness", "stability", etc.
AGREED!!!!
OK, one more post from me today before I take care of some household chores:
So, I mentioned a "wrinkle" in the the MetaAuthority scheme I outlined. Before I address that, I hope it's clear to everyone that the selection of the term "MetaAuthority" was based on the idea that an accrodingTo Reference instance of a TNU is the "Authority" (i.e., the one making the actual taxonomic assertion); so the role of the "MetaAuthority" is to serve as the "Authority of Authorities", to select from among multiple authorities regarding which the "accepted" taxonomy is. I often refer to "CoL" as the stereotypical MetaAuthority, but really it's more of a MetaMetaAuthority (with the various GSDs being the true MetaAuthorities).
So... here's the wrinkle: The original MetaAuthority concept (circa the Phyloinformatics paper) assumed that a single TNU would capture both the concept/circumscription and the correct classification (parentTNU) and the correct nomenclature (Code-compliant, spelled correctly, etc.). I suspect in most cases, such TNUs do exist that can be pointed to as the `canonicalTNU'. But there are probably more than a few edge cases where this is not the case (e.g., where one publication got, say the circumscription and classification right, but botched the nomenclature, and another publication got the nomenclature right, but didn't even assert a circumscription....etc.).
If people are interested, I will generate a document with figures and such to explain how this all works in our implementation (or, more likely, in the "optimum" implementation, which is improved from our existing functional implementation). But for now, I'll just summarize here:
One solution is to have three properties of a CanonicalTaxonNameUsage instance, something along the lines of:
These could all be the same TNUID when a single TNU got it "right" from the perspective of the MetaAuthority. But they don't have to be -- they could be three separate TNUs, or one for the Circumscription and Classification and another for the Nomenclature.
Another option is that when no single pre-existing TNU got all three sets of parameters (circumscription, classification, nomenclature) "right", the MetaAuthority could simply mint a new TNU that does have all three sets of parameters "right", then additionally assert the relevant TaxonRelationshipAssertion instances to capture the congruencies. This favors the MetaAuthority-as-a-subclass-of-Reference approach, as it means one less step in minting such ad-hoc (but permanent and resolvable) TNUs.
I hope that makes at least some sense....
I can elaborate more on this later, after the household chores are complete.
One solution is to have three properties of a CanonicalTaxonNameUsage instance, something along the lines of:
- acceptedCircumscriptionTNUID
- acceptedClassificationTNUID
- acceptedNomenclatureTNUID
I think that is indeed anchored much more in reality than a single cTNU. But it implies 3 different parts of a TNU that one is referring to: nomenclature, classification and circumscription. Should in that case the target not be 3 different classes we are referring to? Otherwise it is up to the user to decide which part of a TNU actually compromises the nomenclature etc. A single TNU class to rule them all is attractive, but maybe it is too simple and open for interpretations and misuse.
As for acceptedNomenclatureTNUID we already have TaxonomicName this could refer to. And if circumscription and classification are to be separated (for good reasons I think!), should there also be 2 classes for them so that all 3 finally make up a TNU? Thinking this further it sounds an awful lot like bringing back the TaxonConcept class for the circumscription ...
@mdoering -- I think it is better and more manageable (in the long run) that a TNU encapsulate all of those properties (Circumscription, Classification, Nomenclature) together, because that's how taxonomy has always been practiced (at least since Linnaeus). Certainly not all TNUs include information in all three areas (e.g., type catalogs lack circumscriptions and classifications; most References do not assert classifications all the way up to the rank Domain, so the highest-ranking TNUs lack Classification; etc.). But a TNU encompasses all three areas (which are tightly interconnected), so I think it is a mistake to parse them out into three separate classes.
This is why I think the preferred solution should be that when a single existing TNU does not capture all three areas "correctly" from the perspective of a MetaAuthority, the MetaAuthority should generate its own TNU to serve as cTNU. In fact, the reality is that most existing MA's (e.g., GSDs) don't have reliable information on which pre-existing TNUs exactly capture the "correct" view (even though I suspect the vast majority of "correct views" are already reflected in existing TNUs). So I think the best way to kick-start the content for all of this is for Meta-Authorities to mint TNUs that reflect their assessment of the "correct" Circumscription, Classification, and Nomenclature (CCN?), and then over time these MA-generated TNUs can be retired in favor of non-MA-generated TNUs (e.g., published TNUs) -- which likely contain much more taxonomic information, and hence are mbuch better bearers of the status of cTNU.
The goal is to be able to have a computer automatically compare TNUs across all three areas (CCN). Nomenclature is easy and automatic via the ProtonymID property of a TNU. Classification is also easy ad automatic via parentNameUsageID. Circumscription is semi-automatic for that TNUs provide full heterotypic synonymies (by comparing sets of included heterortypic synonyms), but this only captures Circumscriptions at the granularity of type specimens, and only for TNUs that explicitly provide full heterotypic synonymies. Thus, we need TaxonRelationshipAssertion instances asserted by third parties to accommodate cases when the source TNUs themselves don't include explicit assertions about congruency/etc. with other TNUs.
The idea of a "cTNU" is simply a way of recognizing certain TNU instances as shared "anchorpoints" around which networks of TaxonRelationshipAssertion instances can be aggregated. Having MA's "bless" those cTNUs (or mint them when an existing one doesn't quite capture the full CCN properties, or when one exists but hasn't been identified yet).
At least that's how I see it.
I guess I'm still looking at this a little differently To me the circumscription is the piece that defines a cTNU (or Taxonomic Concept) by detailing what it represents in the real world. The cTNU would, ideally, be globally unique, meaning that there is not another cTNU which also represents the same concept and the MetaAuthority would be responsible for ensuring the uniqueness of a cTNU and for deprecating instances where duplication occurred (which we know is inevitable). This certainly means that a MA would need to have a deep understanding of a taxa and I think of Avibase as a model for an MA.
As for an MA blessing the 'best' Classification and Nomenclature for a cTNU, I think that is not a necessary part of a cTNU (since a number of TNUs which have already defined Classification and Nomenclature will be linked back to the cTNU) and the 'best' is more up to the consumers of a cTNU and not the coiners of the cTNU (MAs)? I am hoping that a cTNU is more removed from the Classification and Nomenclatural 'wars' and is a way to place a unique identifier on a new Taxon / Taxon Concept (per Richard) "A set of biological entities, alive, recently dead and yet to be born, asserted to comprise a collective unit in nature to which a scientificName is assigned." When a new 'collective unit' is recognized, then a new cTNU is created but there is not necessarily a determination by the MA as to the 'best' classification or nomenclature, only the best circumscription, leaving the Classification and Nomenclature up to the Authorities. I guess this doesn't preclude a group like COL from being both a cTNU coiner and a Global Classification authority, but I do see these roles as being different.
@jgerbracht : If I understand you correctly, it would mean that different MAs with overlapping Scope would need to coordinate with each other, so that they both picked the same TNU as the canonical form representative.
This reminds me of a subtle but important point that I glossed over when I outlined what I thought the CTNU class would look like. I had outlined the terms in this Class as something like:
canonicalTaxonNameUsageID
[unique identifier for the cTNU instance itself]metaAuthorityID
[indication of the MetaAuthority asserting the canonicalTNU]acceptedTaxonNameUsageID
[indication of the canonicalTNU itself]There are IDs for three separate things here, which I'll refer to as cTNU, MA, aTNU.
Note that in this framing, cTNU is an instance of a different Class from aTNU. The latter is of the Class TaxonNameUsage, and the former is of the class "CanonicalTNU" (which is not itself representative of a TNU, but rather an assertion by a MA about an aTNU).
So just to be clear here, using this terminology I think you mean that there should be a single aTNU that is unique for a circumscription. In that sense, two different MAs could each have their own cTNU instance, but in an ideal world, both would converge on the same aTNU value (as opposed to one MA selecting one TNU as the aTNU, and another MA selecting a different but congruent aTNU instance). This would only be an issue when both MAs agree that the same circumscription is "correct". When one MA is a lumper and the other is a splitter (for example) they obviously would not converge on the same aTNU values.
I have to admit I'm still not sure which pathway makes the most sense to me: combining the "meaning" of a cTNU to include all three axes (Circumscription, Classification, Nomenclature), or parsing them out into separate things (which, as @mdoering suggests, would probably fall back to separate classes). My gut still tells me that it's better to combine them into one set of properties that are tightly bound together, mostly because that's how taxonomy actually works. But I also see the value in parsing them so that changes in one of those axes doesn't force a change in the anchorpoint for all three.
I don't agree with the distinction that decisions about the "best" Circumscription definitions are in the realm of MAs, while consumers should sort out the Classification and Nomenclature stuff. For starters, I think the Circumscription wars have been just as bloody (if not bloodier) than the Classification wars. Moreover, Classification and Circumscription are exactly the same thing in principle, it's just that Classification is the Circumscription as applied to the parent taxon. Nomenclature is its own thing, and is generally reduced to pedantic skirmishes (non-gender agreement among lepidopterists notwithstanding), but is also often just as specialized (if not more so) than making decisions about circumscription boundaries. In other words, to me there is an equal argument to be made for all three axes being asserted by an MA.
Having said that, I do agree that there are very legitimate reasons for parsing out the three properties into three different classes/instances (e.g., ZooBank could focus on being the MA for zoologcal nomenclature, whereas CoL could be the MA for classifications/circumscriptions of higher-rank names, and the GSDs could be MAs for classifications/circumscriptions of lower rank names). But at this stage, I still think that the costs of doing so exceed the benefits, for several reasons. One of those reasons is that the most common (and commonly utilized) taxa are those ranked at species, and of course a "species" merges classification and nomenclature into the same package (i.e., the binomen). So it's a little tricky to excise Circumscriptions of species from both their names and their classifications.
When I think of a circumscription (and it's likely I don't understand this term exactly either), I think of a range and id description which may or may not (but certainly should) include at least 1 type specimen. For example, something like this as a circumscription in my mind and please correct me if I'm way off base.
"Described as Newtonia brunneicauda monticola Salomonsen 1934 (14: page 207); type locality Manjakatompo, Ankaratra Mountains, Madagascar Distribution: Ankaratra Mountains, in central Madagascar. Identification: Similar to nominate brunneicauda Salomonsen 1934 , but upperparts darker grayish green or darker grayish olive brown; underparts richer brownish buff; and slightly larger (14, 5). "
With this, ideally someone familiar with bird taxonomy knows exactly which individuals and population(s) of birds are included in the cTNU and which populations are not. The taxonomists in each of the three major Bird Taxonomies can assign their own classification and nomenclature (TNU) to the correct cTNU. And as further exploration of the Ankaratra Mountains is conducted, this distribution statement could be updated with a more accurate and current statement. Hope this makes some sense.
Re MAs, I would hope that there wouldn't be competing MAs, as there aren't competing DOI organizations. I know it's not 100% analygous, but.... I can easily see all the bird taxonomists accepting Avibase as the DB of record for cTNUs. With COL, eBird, Birds of the Word, GBIF, etc. utilizing Avibase for cTNU lookup. And a global taxonomy authority would be free to assign the name Newtona monticola to the cTNU while another might assign Newtonia brunneicauda monticola to the same cTNU. i.e. Newtona monticola according to authority xyz = the above cTNU.
Frankly, I've though much less about individual taxonomic publications and much more about global taxonomies which coalesce the individual publications into an over-arching classification.
I keep thinking on 'what can we use to define' a Taxon Concept. Something that clearly distinguish that concept from other concepts and also be independent of the 'current' understanding of that concepts classification and nomenclature.
On a slightly different note, and hoping not to muddy the waters much, but I see 3 "authorities" involved.
In reality, the Taxonomy Authority and the cTNU keepers will frequently be the same, but in some taxonomic groups, they will be different. @deepreef, are you thinking the MAs as you've described are 2s and 3s??
You want to find an explicit, locally working balance between TNU inflation - where every usage is seemingly in need of additional articulations to achieve meaningful data integration - and TNU compression - where individual or canonical TNUs are overburdened with nomenclatural and taxonomic information so that they can no longer serve more granular communication and authorship accreditation functions. There should not be "clear", or necessary and sufficient criteria that can somehow be read off of a body of data, "unambiguously". Allow for a market of implementations within a framework that expresses desired communication values and services but does not overreach by answering questions that should be left to the implementing projects and people. Strongly recommend that implementations be explicit and perhaps even consistent and exhaustive about their internal rules. Abstain from writing the TNC document in any way that (I think, desperately) tries to anticipate or even prevent any form of misuse and ill-fated implementation. Trust that inflated or compressed TNU authoring and citation practices will undergo some sort of post standard-release selection, anyway.
@jgerbracht (first post above): I don't think there is any universal definition for "circumscription" -- it's been around for a while, and really (to me, anyway) represents a more explicit version of "Taxonomic Concept". But the point is, your understanding of the term is equally as good as anyone else's!
So I think we're fundamentally thinking of the same thing, based on what you wrote above. If I were to come up with a definition for the term "circumscription", it would basically be the same words that you quoted me on for Taxon Concept:
A set of biological entities, alive, recently dead and yet to be born, asserted to comprise a collective unit in nature.
(it exists whether or not a scientificName is assigned to it; hence the truncated quote).
But defining the term is the easy part, I think. The harder part is the ways in which we define (or indicate) the boundaries of the circumscription itself. The crudest but most informatically-practical way of defining the boundaries is via the included type specimens (i.e., the type specimen of the accepted name as well as the type specimens of all the names asserted to represent heterotypic synonyms of the accepted name). Any TNU that includes an explicit (and complete) set of heterotypic synonyms (as opposed to no synonymy, or only a geographically regional synonymy) would be fleshing out a circumscription to at least the granularity of type specimens (the heterorypic synonyms serving as proxies for the type specimens).
However, there are other was to define, describe or indicate the boundaries of a circumscription, including the example you gave (i.e., geographic distribution/populations, diagnostic characters, or some mixture of both).
However, any of these methods for defining/describing/indicating circumscriptions requires subjective interpretations, and hence the need for TaxonRelationshipAssertion instances.
To illustrate what I mean by this, suppose we have the chronology of TNUs:
We can reasonably infer that Aus bus Smith sec. Brown ≈[Aus bus Smith sec. Jones + Aus xus Jones sec. Jones].
But what of the relationships between:
Maybe there is enough information in Smith 1950 to figure that out with confidence, and maybe not. Also, if you can figure out one or more of these with confidence, then can we infer others by default? For example, if we can confidently say that Aus bus Smith sec. Smith is congruent to Aus bus Smith sec. Jones are congruent, can we also say (with equal confidence) that that Aus bus Smith sec. Smith excludes Aus xus Jones sec. Jones? My intuitive sense for set-theory logic says "yes", but I'll defer to people with much more experience in this area than I have.
And speaking of @nfranz : I completely agree with what you say in the post immediately above this one. Wearing both my taxonomic philosopher's hat and my database implementer's hat, my gut still leans towards encapsulating implied Circumscription, Classification and Nomenclature within the aTNU selected for a cTNU instance by an MA. But I'm definitely open to persuasion on this. Regardless, I think @nfranz rightly points to the need for a balance, and avoidance of (pretending to) impose too much structure on would-be implementers.
@jgerbracht (second post above): Alas, I think all three often blend into each other. In our implementation, a single publication can be a MA (usually one that falls closer to your # 2 than your # 3). In fact, the way we did our implementation, the end user selects multiple MA's to follow, in a priority-ranked sequence. For example, suppose a single new species within my scope of interest was described yesterday. At the moment, no other MA out there (published or not) has dealt with it yet, so the new species description itself becomes a MA at the top of my priority list. Then suppose that a genus revision within my scope of interest was published last year, and I agree with all of the treatments included therein, but not all of them have been picked up by broader MAs and/or I don't agree with the broader MAs on all of their decisions for this genus, so I select the revision as my priority # 2 MA. Then suppose my Priority # 3 MA is a regional checklist that got everything right in my scope of interest except the genus published last year. Then after that I default to a particular GSD as # 4 to catch everything else not included in the checklist. Then after the GSD is, say CoL or Worms (or both in some particular priority order). In this context, the distinctions between your three tiers of MAs becomes a bit more ambiguous.
I'm not saying that my implementation is the one that should be used to determine how these things are defined in the standard (especially given that my own ideas about how best to implement this are evolving); but I will certainly plan on submitting it (or a revised version of it) onto the "market of implementations" (good, term, @nfranz!)
So, in answer to the actual question posed by @jgerbracht : I would say "all of the above". This is also partly why think MA-as-subclass-of-Reference makes more sense than MA-as-subclass-of-Agent.
I’m pretty sure I agree with @nfranz here [above], and here [1]
The whole idea of a “meta-Authority” might very well be a scientific anti-pattern, unless, of course, there are many of them.
The advantage of the TNU approach is that it can provide the commons - via the free interchange and the reuse of unencumbered data - for the evolution of new theories and hypotheses to test them.
At the beginning of this thread I thought maybe our National Species List (NSL) was a kind of meta-Authority, but now I see that it is more of a concept map. There is no obligation to implement it - In fact many institutions don’t, instead using it as an aid to inter-disciplinary, taxonomic communications. The infrastructure that supports it enables many such points of view and , in-fact, encourages them.
IMHO It is the role of taxonomy to place all this stuff in context. The role of TCS is to document that placement in reusable form.
[1] Nico M Franz, Beckett W Sterner, To increase trust, change the social design behind aggregated biodiversity data, Database, Volume 2018, 2018, bax100, https://doi.org/10.1093/database/bax100
The whole idea of a “meta-Authority” might very well be a scientific anti-pattern, unless, of course, there are many of them.
If the assertions of MetaAuthorities have any influence whatsoever on the scientific practice of taxonomy, then we have done something very, very wrong. In my mind, the "science" of taxonomy stops at TNUs, and even there within only a subset of TNUs (i.e., those with reasonably robust treatments and synonymies). The whole point (in my mind) of the MA scheme is to provide a service to non-science consumers (or, at least, non-expert consumers) of taxonomic information.
I'll defer to @nfranz regarding the extent to which TaxonRelationshipAssertions (or whatever we end up calling them) represent "science" (I would like to hope so, but I just don't have enough experience in this realm to form a decent opinion on that question).
@ghwhitbread : I think the NSL is a MA (in my view of MAs). What you describe as how institutions use NSL is exactly how institutions (should) use CoL, GSDs, and other candidate MAs. Indeed the entire impetus for coming up with MAs in Taxonomer was to emphasize the value/reality of multiple views on the taxonomic landscape. All we're talking about here (or, at least, all I'm talking about here) is providing a structured mechanism for these multiple/alternate/competing taxonomic points of views to be standardized, exposed, and compared (and, maybe most importantly, actually used by more than just the random human visiting the random website, and updating some random record in an institution's database -- as I think happens in a lot of use cases).
Having said that, and following the point raised by @afuchs1 in the call earlier today, while I think this exchange has been extremely valuable (and I hope it continues), from a TCS perspective I see it as a much lower priority than finishing up the TCS standard relating to TNUs and associated attributes (followed by fleshing out References and/or Agents). As has already been noted several times, this is much more in the realm of implementation, and as such I think we'll be better off examining these MA/cTNU-related issues after we have a large pool of TNUs available for our consumption and testing.
Actually, @deepreef it would not matter whether you or I in particular think this or that assertion meets our criteria for science (which, by the way). That is my point: there are practicing communities out there that act on the pragmatic judgment that TNU-to-TNU assertions can improve the precision of communicating about taxonomic congruence and the lack thereof. We are serving those communities, and hopefully others that we can inspire.
@nfranz : Fair points.
But I still think we should prioritize getting the TNU parts of the standards sorted out, then worry about all the MA/cTNU stuff (which can be layered on afterwards). The TaxonRelationshipAssertion stuff should probably be included as a priority as well, if the goal is to maintain the functionality of TCS.
@deepreef, @ghwhitbread:
@ghwhitbread : I think the NSL is a MA (in my view of MAs).
NSL works differently from CoL. Up till now I was considering e.g. APC (Australian Plant Census) to be one of the meta-authorities behind the data sets for which the NSL is the provider, but APC is probably a "proper" authority in your (and now mine) view.
CoL on the other hand is a real meta-authority, as they do not do taxonomy, but they do taxonomy stitching and the classification of a CoL TNU may be different from that of a TNU from the authority from which they take it. That's why I think that meta-authorities like CoL should always have their own TNUs, as there is always something different. Also, we still want to be able to say that something is Aus bus sensu CoL 2019, even if there is nothing different between that and Aus bus sensu
@nfranz, I think we are talking different things here. The TNU-to-TNU assertions are not in question, although that might have been suggested. I am a big fan of them. Moreover, they are already in TCS and we have covered them in our review of TCS and, I think, they are ready to be used (although we might still want to tweak them later). In our last meeting before TDWG last year, we added a term to indicate whether the assertion is ostensive or intentional (forgive me if I am not using the right terms). Also, we cleaned out the relationship types, so now the RCC-5-like relationships are the only ones left. We did add a 'intersects' relationship (#45) for cases where we don't know what the relationship is, except that it is not 'excluded'.
This issue is about IDs for "taxon concepts" that are not TNUs, but something that is shared between TNUs. This was considered important at the workshop and CoL symposium at Biodiversity_next (TDWG 2019), as you know, so it is a good thing it came up here and we are discussing it now.
According to the Vocabulary Maintenance Standard, we have the standard (or set of vocabularies) and application schemas. I think what we are discussing here is more likely to become a (potentially TNC endorsed) application schema than end up in the standard. Also, I think we are too easily accepting the premise that practising communities need these IDs and I would rather look into whether the things that they expect a "Taxon ID" to give them cannot be better handled differently (with Taxon Relationship Assertions). For example, @jgerbracht 's example that eBird would happily accept the canonical AviBird TNU as the canonical TNU and the CoL TNU as the classification TNU, requires both AviBird and CoL to not have the same idea of what canonical TNUs are that eBird has and I think it would be much easier to just make 'congruent' relationship assertions between the eBird TNU on the one hand and the AviBird and CoL TNUs on the other. What do you need the extra ID for?
I didn't mean to imply that CoL is a global taxonomy at least with birds. As you say, if they aren't in the taxonomy world but are stitching together existing taxonomies then ideally they will be using TNUs from those existing taxonomies and not coining new ones.
To hopefully make this clearer, i'll give an implementation example, lets say Avibase has a cTNUid of 12345 for the taxonomic concept using the earlier circumscription example.
"Described as Newtonia brunneicauda monticola Salomonsen 1934 (14: page 207); type locality Manjakatompo, Ankaratra Mountains, Madagascar Distribution: Ankaratra Mountains, in central Madagascar. .... "
Note, there is nothing in the cTNU that defines what the scientific name, common name or even authority are (though it does have an original name/reference).
Clements is the global taxonomy used by eBird though for the sake of this example, let us assume CoL maintains a global avian taxonomy and their taxonomy is used by eBird. CoL defines a TNU 67892 with a taxon concept or cTNU 12345 (Avibase) And TNU 67892 layers on the scientific name of Newtonia monticola, a vernacular name of Mountain Newtonia and associated parent relationships, i.e. included in the family Vangidae, etc.
What I was hoping to convey is that Avibase and CoL do have the same canonical TNU as each other, the one coined by Avibase.
a simplified eBird observation record would simply be
obs_date, location, observer, count and TNU
And to share this observation with GBIF, I could simply give them the above with the addition of a cTNU. Let's say GBIF uses the BirdLife global checklist, they would have the most current BirdLife TNU tied to the cTNU eBird shares, making the transition of an observation from Taxonomy a to Taxonomy b, or Clements to BirdLife very simple.
To answer your question about why not use relationships for all TC matching, a couple reasons come to mind, there will be literally 100s of TNUs associated with cTNU 12345 and efficiently following congruent relationships from one TNU to the other will be inefficient and likely traverse a number of intermediate TNUs. And I believe, that would be much more error prone. Since each relationship has the potential of introducing an error, navigating one relationship cTNU to TNU is, in my mind much 'safer' than navigating possibly 100s of relationships.
Thanks @jgerbracht.
After talking with @deepreef yesterday I am starting to come around a bit to the idea of canonical TNUs.
In the later meeting yesterday we decided to close all open issues and and focusing on writing up the new version for now. We will pick this up later.
The use case of an ID for a taxon that is independent of the taxon name and only changes if the delimitation of a taxon has changed, or, in other words when we are really speaking of a different taxon and not different labels for the same thing, came out of the TNC workshop and the Catalogue of Life symposium at Biodiversity_Next. It has been discussed in some length in issue #48, which was about a different subject, but I have now split it off into an issue of its own so that we can close that issue when we have reached a conclusion on what to call what is now called Taxon Relationship Assertion.
I think where we stand at this moment is that this use case is important, but that it can be addressed without adding a different class (i.e. Taxon Concept not sensu TCS), by assigning a TNU as the type or canonical TNU and linking other congruent TNUs to this canonical TNU.
This could be implemented three ways that I can think of:
by having a boolean property
isCanonicalTNU
(or something like that) on the canonical TNU; the linking of other TNUs is done through 'congruent' Taxon Relationship Assertions.by having a
canonicalTNU
property (on all TNUs) that takes a TNU uri as its objectas 2, but with the canonical TNUs in a different table (so basically having a new class).
Which TNU among congruent TNUs is the canonical one is a matter for the implementation. @deepreef has given some options/considerations in #48.
Unless there is a need or wish to exchange these "stable" IDs, I think there is no need to add additional properties to the standard and all we have to do at this time is to write up the use case with options of addressing it.
Did I get this right?