tdwg / tnc

Taxonomic Names and Concepts Interest Group
22 stars 7 forks source link

Taxon, Taxon Concept and Taxon Name Usage: definitions and relationships #1

Closed ghwhitbread closed 5 years ago

ghwhitbread commented 5 years ago

You write: A taxonomic concept is a taxonomic name instance establishing or circumscribing a taxonomic entity - often linking synonymic inclusions and adding annotations, description…

I think it's cleaner to say that the taxonomic concept is a theory of a certain taxonomy identity. And then "taxonomic concept label" (name sec. source) is the "name" for that theory.

More or less like here: https://jbiomedsem.biomedcentral.com/articles/10.1186/s13326-017-0174-5 ...

Best, Nico

nielsklazenga commented 5 years ago

@jgerbracht and @deepreef: Thanks for all the use cases you provided. I will copy them into a new issue, as identifying use cases is one of the things the TNC needs to do.

@frmichel: As a group, we are very interested in your work with BioSchemas, so we will keep a close eye on it. Unfortunately I missed your presentation on it at SPNHC/TDWG 2018, but I did see your presentation on microservices. That was very cool stuff.

deepreef commented 5 years ago

I'm not sure I will be able to join the second call, but I will try. I guess the two things from this recent round of posts that I wish to comment on now are:

1) I'm a bit uneasy with distinguishing "taxon" from "taxon concept". Both exist in the minds of people (not in nature), and as far as I can ascertain, all taxa are conceptual, and only exist through assertions by people.

2) It seems that people want to define "name" as a label/text-string. I think that's fine, as long as we're all clear on that. However, names-as-text strings are not particularly useful in modelling the nomenclatural component of taxonomic information (they are important primarily because that's all we have for the majority of biodiversity information). Much more important/useful is names as entities or objects. Perhaps we can adopt "Protonym" as a term that represents the name-as-object (with many properties, of which the text-string label is only one). It's not really what Protonym was intended for, but it can serve this function if it helps reduce the scale of confusion when people refer to "Names" in this discussion. One of the key tasks for harnessing the power of taxonomic information in biodiversity informatics is to map text-string "names" to Protonyms. This is MUCH more tractable than mapping identifications to TNUs (thanks @ghwhitbread for clarifying that point -- and I completely agree now that I understand your point).

Just to follow up a bit more on Greg's point here: while it is very difficult/impossible to directly link text-string names from identifications to specific TNUs, it is much easier to link text string names to their constituent protonyms, and from there, given a reasonably fleshed-out index of TNUs, one can at least narrow down the scope of potentially applicable TNUs for each identification.

jgerbracht commented 5 years ago

@deepreef "taxon" and "taxon concept" may be the same. I keep thinking of a taxon as a concept that is tied to a specific taxonomy whereas the taxonomic concept is completely disconnected from any specific taxonomy, other than possibly it's originating definition. Another way of describe it is a taxon is a node and all the stuff under the node in a single tree. The taxon concept is the node and stuff under the node, disconnected from any tree, i.e. a branch I can carry from tree to tree and graft on as the taxonomist feels is correct. If you're thinking a taxon as more like the disconnected branch, then I can easily agree that taxon and taxon concept are the same.

@ghwhitbread and @deepreef Greg mentions in his earlier post some things I would like to dive into further, because I think they are the key to this entire discussion of names/protonyms/taxonconcepts.
"Much more important/useful is names as entities or objects" and "The thing is, that if you think you need a taxon concept, you are probably making one, or at least documenting one that already exists, and, though platitudes abound, for real-world usage, a disambiguated name has to be the more practical alternative." One point I need clarified is "What do you mean by a disambiguated name?" This is an important point and I want to make sure I'm not interpreting this differently then you intend. Is it a single instance of a TNU being used as a proxy for a TaxonConcept?

I don't recall if I've given background on how the Lab of Ornithology manages concepts and names, and I'll give a brief summary here. The lab manages several fundamentally different projects related to birds, eBird gathers bird observations globally from birders, citizen scientists and ornithologists. Macaulay Library (ML) is a repository for multimedia focused on the natural world, with the largest number of assets focused on birds. Birds of North America, Merlin, All About Birds and Neotropical Birds are online species accounts/monographs which include authoritative text covering the life histories of birds. These life histories also dynamically include ML assets and eBird visualizations.
We also maintain the Clements checklist of Birds of the World, which is updated annually. Each eBird observations, ML asset and each species account is keyed using a Taxon Concept identifier i.e. in each of these databases, there are no taxon names. The Clements checklist is where the names come from. When a new version of Clements is released, any name changes or reshuffling of taxa gets mapped to the appropriate TaxonConcept ids and new TaxonConcept ids are created as needed. To expose new names and new taxonomic structures, none of the 500,000,000 eBird observations change, none of the 10,000,000 ML assets get changed, instead, the names that are displayed to the users of these applications are applied by a Clements name lookup API using TaxonConcept id. Species Accounts which dynamically include data visualizations and multimedia from eBird and ML utilize APIs based on TaxonConcept ids.

nfranz commented 5 years ago

Here's a different take, from https://doi.org/10.1101/233973 (just updated the 1st version to reflect that of the final paper due out soon in PLoS Computational Biology; doi:10.1371/journal.pcbi.1006493).

Syntactic and semantic conventions

1. Taxa are models, concepts are mimics. We typically refrain from using the terms "taxon", "taxa", or "clade(s)". We take taxa to constitute evolutionary, causally sustained entities whose members are manifested in the natural realm. The task for systematics is to successively approximate the identities and limits of these entities. Thus, we assign the status of 'models' to taxa, which systematists aim to 'mimic' through empirical theory making. This perspective allows for realism about taxa, and also for the possibility to let our representations stand for taxa, at any given time and however imperfectly, to support evolutionary inferences. In reserving a model status for taxa, we can create a separate design space for the human theory- and language-making domain. In the latter, we speak only of taxonomic or phylogenomic concepts - the products of inference making.

2. Sameness is limited to the same source. Therefore, for the purpose of aligning the neoavian explosion use case, we need not speak of the "same taxa" or "same clades" at all. Similarly, we need not judge whether one reconstruction or the other more closely aligns with deep-branching avian taxa, i.e., which is (more) 'right'? Instead, our alignment is only concerned with modeling congruence and conflict across two sets of concept hierarchies. The concepts are labeled with the "sec." convention to maintain a one-to-one modeling relationship between concept labels and concepts (clade identity theories). Accordingly, there is also no need to say that, in recognizing each a concept with the taxonomic name Neornithes, the two author teams are authoring "the same concept". Instead, we model the two labels 2015.Neornithes and 2014.Neornithes, each of which symbolizes an individually generated phylogenomic theory region. As an outcome of our alignment, we may say that these two concepts are congruent, or not, reflecting the intensional alignment (to be specified below) of two phylogenomic theories. But, by virtue of their differential sources (authorship provenance), the two concepts 2015.Neornithes and 2014.Neornithes are never "the same". "Sameness" is limited in our approach to concepts whose labels contain an identical taxonomic name and which originate from a single phylogenomic hierarchy and source. That is, 2015.Neornithes and 2015.Neornithes are (labels for) the same concept.

deepreef commented 5 years ago

I think this conversation has been interesting and helpful. As noted during the second call yesterday, it has strayed somewhat far from the original "issue" as posted, but it has nevertheless been an interesting discussion with some fresh suggestions. I don't want to belabor this too much more, as I think it will be more productive to deconstruct TCS and focus on specific technical issues around that process. However, because this can potentially serve as a useful documentation of issues, I wanted to add a few more comments. Most of what follows is probably not useful to the task at hand, but may be potentially useful to the hard-core taxonomy nerds among us, and those who encounter this thread the next time we have this conversation....

@jgerbracht: "One question I have for the taxonomists in the group [...] does a TNU ALWAYS map to a TC?"

Me: No. Of course which ones that DO map to TCs depends on how you define TC (several variants are floating through this thread). However, there are many TNUs that objectively do not imply a TC. For example, some publications simply provide catalogs of type specimens for a particular collection, without any taxonomic assertions. Also, all heterotypic synonyms listed within a treatment for a valid taxon each represent a separate TNU, usually without any assertion of the circumscription boundaries of the synonymous names (other than they are included in, or overlap with, or include the circumscription asserted for the valid taxon/name). There are other examples as well, perhaps including your Linnaeus example (although that would be debatable -- we may not be able to discern from the publication WHAT the circumscription boundaries are, but we can generally assume that they existed in the mind of the author in at least some form -- to the point where the circumscription excludes all other implied taxa mentioned via non-synonymous names within the same publication). The point is, not all TNUs imply TCs, but the existence of an (implied) TC is independent of how well its boundaries are documented within a publication.

@jgerbracht: "I can certainly see the advantages of treating TNUs and TCs as the same, but for this to help me in my work both as a generator of Taxonomic Concepts and as a consumer, there needs to be a single identifier for a Taxonomic Concept that does not change, i.e. an identifier ALWAYS maps to the same circumscribed set of organisms, regardless of the TN and TNU applied."

Me: Yes, absolutely! I think that a system that fixes a static and persistent relationship between an identifier and a specific/implied circumscribed set of organisms is much more useful than one where the implied contents of a TC (represented by a single identifier) evolve over time (partly why I'm a little uneasy with the nodes-in-a-tree approach). If we imagine a set of TNUs that we can confidently ascribe to congruent sets of organisms (i.e., all the TNUs refer to the same abstract concept), then we have two choices from an informatics perspective: 1) select one of the TNUs (e.g. the chronologically first, or most well-documented) as the "ring-bearer" for the concept (or "type concept", to follow the nomenclatural model), then hub all the other TNUs off that one TNU via RelationshipAssertions (in which case the identifier for that one TNU becomes the identifier for the concept); or mint a new "concept" identifier, separate from the TNUs, to which all the TNU identifiers are mapped. Too much to discuss here, but perhaps this can fork off as a separate issue of discussion, if it is ever deemed relevant.

@nfranz: "We take taxa to constitute evolutionary, causally sustained entities whose members are manifested in the natural realm."

Me: Understood, but many do not see taxa this way (i.e., the camp that maintains that "Nature produces organisms, not taxa.") I'm not sure we'll be able to resolve this (long-standing) philosophical issue in time to embrace one perspective or another in designing informatic models and solutions to taxonomy, so whatever the informatic solution is, it should probably be agnostic to this issue (or at least accommodating of both views). This may seem like a tangent, but I think it lies at the heart of the question of whether "taxa" should receive their own identifiers as distinct (dare I say "real"?) objects, vs. whether taxa exist only as assertions, and therefore are better represented by TNUs (and/or clusters of TNUs representing congruent circumscriptions).

@nfranz: "Instead, our alignment is only concerned with modeling congruence and conflict across two sets of concept hierarchies."

Me: Do I assume correctlt that a translation to TCS-space would be that you're saying these kinds of "sameness" statements should be documented via instances of tcs:relationshipAssertion? If so, then philosophical debates aside, I agree 100% with this!

nfranz commented 5 years ago

Thanks, @deepreef. You say: "I'm not sure we'll be able to resolve this (long-standing) philosophical issue in time to embrace one perspective or another in designing informatic models and solutions to taxonomy, so whatever the informatic solution is, it should probably be agnostic to this issue (or at least accommodating of both views). "

But that's exactly what the DwC definition of Taxon fails to do. See http://rs.tdwg.org/dwc/terms/#Taxon: "A group of organisms [...] considered by taxonomists to form a homogeneous unit."

That definition leaves no freedom for anything beyond the conceptual, human-inferred and -verbalized realm, and yet it uses the term "Taxon". Thereby taking away from those active and applied biologists who must be realists (including us two when we publish on species), with reasonable scientific provisos, to speak of taxa to do evolutionary inference making, conservation policy making, and so on.

DwC "accommodates both views" by stipulating there cannot be a meaningful distinction. In short, my issue with DwC Taxon is not that it fails to recognize the empirical/inferential/provisional nature of taxonomic concepts. It's that it's using the wrong term "Taxon" to blur any possible lines between natural order and human representation, proactively.

This is what a good new TCS implementation promises to fix.

ghwhitbread commented 5 years ago

Thank you @nfranz . I must admit that I often find a tension within myself as to how I might argue for or against your point of view. But on this, I am completely in agreement.

mdoering commented 5 years ago

Nico, do I understand correctly that you would argue to exclusively use TaxonConcept in an updated TCS and avoid Taxon entirely? This is what TCS actually does, only DwC has settled on Taxon. Probably because its a more common term to most. TaxonName, TaxonConcept and TaxonRelationshipAssertion are the main classes defined by TCS.

Markus

On 19. Sep 2018, at 01:47, Nico Franz notifications@github.com<mailto:notifications@github.com> wrote:

Thanks, @deepreefhttps://github.com/deepreef. You say: "I'm not sure we'll be able to resolve this (long-standing) philosophical issue in time to embrace one perspective or another in designing informatic models and solutions to taxonomy, so whatever the informatic solution is, it should probably be agnostic to this issue (or at least accommodating of both views). "

But that's exactly what the DwC definition of Taxon fails to do. See http://rs.tdwg.org/dwc/terms/#Taxon: "A group of organisms [...] considered by taxonomists to form a homogeneous unit."

That definition leaves no freedom for anything beyond the conceptual, human-inferred and -verbalized realm, and yet it uses the term "Taxon". Thereby taking away from those active and applied biologists who must be realists (including us two when we publish on species), with reasonable scientific provisos, to speak of taxa to do evolutionary inference making, conservation policy making, and so on.

DwC "accommodates both views" by stipulating there cannot be a meaningful distinction. In short, my issue with DwC Taxon is not that it fails to recognize the empirical/inferential/provisional nature of taxonomic concepts. It's that it's using the wrong term "Taxon" to blur any possible lines between natural order and human representation, proactively.

This is what a good new TCS implementation promises to fix.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/tdwg/tnc/issues/1#issuecomment-422596266, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AAT_UWhxCK4D_0PnxO9frfk9tDIerTb3ks5ucYYAgaJpZM4WNKqM.

jgerbracht commented 5 years ago

I agree with @deepreef that this has strayed somewhat from the original posted issue, though much of the discussions have been around trying to clarify or define taxon, taxon concept and taxon name usage. To that end, I think it would still be very beneficial, both for us and for future readers of these threads, to put down a definition for these that we agree with.

rdmpage commented 5 years ago

@jgerbracht Looking at the various definitions of taxonomic concept that have been voiced it's not clear to me that an agreement on definitions is going to happen anytime soon. Maybe an alternative to trying to define things is to adopt @baskaufs suggestion and tackle it from another direction:

Given our previous experience, I highly recommend starting with a functional definition (we want this "thing" to connect references to names), rather than starting off by getting hung up on a conceptual definition.

Alternatively, maybe a starting point could be taking an existing solution and asking why isn't that solution fit for purpose? For example, why are Roger Hyam's TaxonConcept, TaxonName and related classes not adequate to express what people are after? Perhaps that discussion might make clearer what the issues are.

mdoering commented 5 years ago

I believe extracting TCS into a plain markdown document for easier review and discussion was the way forward we agreed on in the phone conferences. XML or RDF as the source is a question I am not sure we answered yet.

jgerbracht commented 5 years ago

Sorry I missed the phone conference, I wasn't aware of it/them.

rdmpage commented 5 years ago

Following up a comment by @jgerbracht on use cases and trying to improve my understanding of what different people are after I've built a web page that compares two bird classifications (eBird checklists from 2017 and 2018). Done in a hurry so the parsing of the checklist files is a bit ropey, but when comparing the trees you can see in grey everything that is "the same" in the sense that we don't need to change anything to go from 2017 to 2018, things that have been deleted from 2017, added to 2018, or moved in order to get the 2018 tree. Note that these definitions are based on the minimum edits needed to change one tree into another, not any biological criteria. However, at a quick glance they mostly match the verbal description of the changes http://www.birds.cornell.edu/clementschecklist/august-2018/. Note that for the leaf nodes in the tree I've included the "species code" which eBird uses, so you can see how these are preserved for species even if the species moves genus. My understanding is that eBird uses these codes to link to, so that data is associated with a node in the tree, regardless of whether the name applied to that node changes (as it will if a species moves genus).

As an example, see afrpic1 which moves from Sasia to Verreauxia (screenshot below).

So I guess my questions are whether one of the goals of this discussion is to have a standard that can encode and transmit classifications like the 2017 and 2018 eBird checklists? Are we interested in how we represent the links between nodes in a tree and the taxonomic name to apply to that node? Are we interested in how to describe the changes between classifications? Are we interested in what an identifier attached to a node represents, and considering how those identifiers might change? Or, put another way, are we interested in how anyone wanting to link their information to a node in the tree can do that, and what happens when and if the tree changes.

screenshot 2018-09-19 15 14 19
baskaufs commented 5 years ago

With respect to extracting TCS into Markdown, that's a great idea, although finding somebody to get around to actually doing it would be the problem. The user guide is in PDF format, so it's pretty accessible directly in a browser. The actual XML schema is visualized in graphical form as a PDF generated from XMLspy. I don't find it particularly easy to follow, but have been able to parse it out - particularly when comparing it with the TaxonConcept and TaxonName bits of the TDWG Ontology. As has been mentioned earlier, the TaxonConcept and TaxonName ontologies were built directly from the TCS XML schema and the terms in those ontologies have a tcom:tcsEquivalence property that states explicitly what part of the XML schema they correspond to. So it's pretty easy to compare the two.

What I consider to be the core of both the TCS XML schema and the TaxonConcept/TaxonName ontologies are diagrammed on the page I mentioned earlier in the thread. I built that diagram by looking directly at the ontologies and TCS XML schema. So I'm not sure there is an easier way to summarize them than the diagram on that page. If it would be helpful, I could generate a Turtle serialization of the ontology RDF/XML. Given how much people seem to hate RDF/XML, that might be an improvement.

baskaufs commented 5 years ago

Turtle serializations of the TaxonName and TaxonConcept ontologies are in this pull request

nfranz commented 5 years ago

@ghwhitbread -- thanks! @mdoering -- yes, and, not being a native English speaker, I have tried to always use Taxonomic Concept in fact, to further drive home the point that we are representing the products of human taxonomy making in the TCS. So when a taxonomist (say, Jansen 2018) says: "I revised the weevil genus Minyomerus", the TCS would translate that as follows. First off, the TCS would stipulate - an almost silly point, but still - that the taxonomist has thereby not somehow causally altered the course of evolutionary history. So the TCS translates this into "genus-level or generic concept, with the taxonomic concept label Minyomerus sec. Jansen 2018". In other words, I think a good way to go about reviving the TCS is to recognize that out there in the world of taxonomic data generation, language use can be sloppy or inconsistent. But inside the TCS domain, it is helpful to be explicit and consistent that our bread and butter is representing and meaningfully connecting the succeeding versions of 'mimics' = taxonomic concepts, only. @rdmpage -- re: definitions. I feel that the following is critical but so far only hovering in the background (use cases, linkages, identifiers..). The raison d'être of the TCS should be, very generally: to facilitate integrating biological information (1) as well as possible with taxonomic names and name relationships, and then (2) hopefully beyond that with taxonomic concept labels and relationships (whether RCC-5 or otherwise, your visualization being in the latter set). The second point is still quite radical for TDWG, and I think we would do well to appreciate that. To identify a taxonomic concept and then relate it meaningfully - 'beyond what we can do with homonymy/synonymy' - requires a provisional acceptance of the notion that often enough, taxonomists can express, represent, and indeed reconcile each others' theories of how nature is taxonomically structured with sufficient precision and reliability. Basically, the whole approach presumes that taxonomists can understand each other frequently. Only if that is granted can we proceed to go about identifying distinct taxonomic concepts = theories, and then go about asserting and representing relationships among these theories. And getting these relationships is really the only pay-off here that could possibly justify the effort. So then, re: definitions -- I think a loose family of definitions for Taxonomic Concept, including those mentioned by several above and in the 2005 TCS, are close and good enough. All capture that human taxonomists author a conceptual space that is intended to have some boundaries. Critically, the definition(s) should encourage experts to seek and identify concept-to-concept relationships. That function to me supersedes an I think misplaced desire for precision. If we do not open the definition up to strongly push relationship authoring, we will just get stuck articulating different versions of Fear Of Making Identifiers. It's the linkages we're after. This is again somewhat of a departure for TDWG, in the sense that we are designing a standard that is fairly explicitly telling taxonomists to start connecting taxonomic theories more frequently. If the definition of Taxonomic Concept is too precise or too restrictive to allow a taxonomist to infer/identify a taxonomic theory from..somewhere, really..and link it to theirs, then our over-specification will interfere with allowing concept-to-concept linkages to be "seen" and authored where the narrow definition said they could not, and therefore interfere with our goal of better integrating biological information.

jgerbracht commented 5 years ago

@nfranz I would argue that "facilitate integrating biological information .. with taxonomic concept labels and relationships" is where we should actually be spending our efforts within TDWG, I think that is the entire point of using a Taxon Concept? I understand that this will be difficult and maybe impossible in many taxonomic domains so using taxonomic names and name relationships remains critical. On the other hand using taxonomic concepts is certainly possible in a number of domains. As such, I understand and agree that concept-to-concept relationships are a key element though it's also critical, and certainly possible to broadly use the same concept as opposed to ending up with a plethora of concepts linked via an 'equals' relationship. This sounds analogous to utilizing TNUs and TNU relationships. To give an example, there are well over 100 checklist and checklist versions published since the late 1990s covering birds. I would guess that over 70% of the species level concepts within these 100+ publications covering 20 years are functionally the same (i.e. refer to the same Taxon Concepts). Ideally, I think we want to have 100+ TNUs for each of the 10,000 species and 1 Taxon Concept for each of the 70% species that is re-used within each publication.

nfranz commented 5 years ago

Thanks, @jgerbracht You write: "it's also critical, and certainly possible to broadly use the same concept as opposed to ending up with a plethora of concepts linked via an 'equals' relationship." Yes. But here is my point, and I feel it is an important one for designing this space. If we expect the definition of Taxonomic Concept to do much of the (clearly necessary) 'work' of regulating good taxonomic concept/relationship authorship and citation practices such that we actually end up with well flagged and superior syntactic and semantic content for integrating biological information, then we expect too much of that definition that it can never 'unambiguously' provide (just by itself). So let's not overburden that definition to somehow be expected to tell us exactly and in all conceivable and not-yet-conceived cases, "what are necessary and sufficient conditions for something to be a significantly new and different concept". That expectation towards defining taxonomic concepts may actually hurt us to get off the ground. Instead, I vote for a deliberately open definition; quite frankly the DwC one for Taxon just might do ("considered by taxonomists"). And then leave the work of regulating / promoting good authorship / citation practices to the taxonomic expert community broadly defined. That (good practice) is potentially more an issue of social/scientific/economic work than work of a strictly definitional nature. Some will jump on the opportunity to mostly or exclusively cite 'existing concepts'. Others will need to author 'new ones', but then could be motivated to also provide relationships. What will emerge hopefully is an increasingly widespread culture of what instances merit taxonomic concept authorship versus citation. With increasingly more weight on the latter, and incentive to not do the former without also providing relationships. Doing neither might become scientifically and/or socially unrewarding. And that's the regulatory force, on the back end of the process. But all that is work that I feel lies outside of defining upfront "what a taxonomic concept is". Hence a maximally productive definition of Taxonomic Concept might, counter-intuitively, be quite vague regarding necessary and sufficient conditions.

deepreef commented 5 years ago

Thanks, @nfranz - this is helpful for me to understand what you are saying. To help ensure I understand you correctly, please confirm whether I have your position correct here:

A "taxon" only exists if taxa are "real"/"natural" entities in nature. Taxonomic Concepts are assertions by humans about circumscriptions of organisms (independently of whether they are asserting that such an entity exists in nature as a "taxon", or simply asserting that human-human communication is enhanced/facilitated by recognizing such a circumscription). Because "taxa" may or may not actually exist (depending on one's philosophical perspective), everything we talk about in informatics space is (essentially by definition) related to Taxonomic Concepts (human assertions about taxa, regardless of their philosophical perspective on the biological "reality" of those circumscriptions).

This is not how I have used and understood those two terms, but if it helps to clarify the discussion going forward regarding TCS and its descendant(s), then I'm certainly happy to adopt that distinction in this context.

Or (very possibly), perhaps I have misunderstood your distinction of these terms?

Also, as for "out there in the world of taxonomic data generation, language use can be sloppy or inconsistent" -- that is the tune I have been trying to sing for years! We are DEFINITELY in full agreement on that sentiment! And, as naturally follows, I am also in full support of your suggestion that "it is helpful to be explicit and consistent that our bread and butter is representing and meaningfully connecting the succeeding versions of ... taxonomic concepts, only." [eliminating the word "mimic", because I think that has more potential to confound than clarify].

Finally, I am in FULL agreement with the latter part of that same post, in terms of defining Taxonomic Concepts in relationship to TCS.

@jgerbracht: "I think we want to have 100+ TNUs for each of the 10,000 species and 1 Taxon Concept for each of the 70% species that is re-used within each publication."

I think that is a key question. What you propose certainly sounds reasonable (i.e., bundling up 100+ TNUs into a single "Taxonomic Concept" represented by a single identifier); but I see a fundamental problem that, even after years of thinking about it, I haven't yet seen a plausible solution to. This is the problem that I think @ghwhitbread alluded to (and @mdoering and others), in that the existing information -- covering the past 250+ years of history of literature and (more recently) databases -- is extremely anemic with respect to the sort of explicit Taxonomic Concept referencing that @nfranz is hoping our efforts will encourage future taxonomists to be more explicit about. In other words, the bundling of those 100+ TNUs under a single TC identifier is necessarily a subjective assertion that is prone to subsequent alteration/revision. When it is later discovered that not all of the TC circumscriptions implied by those 100 TNUs really are congruent (e.g., say 60 are one circumscription, and 40 are another), then how do you determine which set of TNUs keeps the link to the original TC identifier? Or is the TC identifier retired and two new ones minted? Or...?

My approach to this would be to link external data directly to one of those 100 TNUs -- ideally the one that most explicitly represents the original data (e.g., "When we identified this bird we were following the 2008 Edition of the Peterson Field Guide") -- then let the cloud of congruent TNUs (and their respective linked data) tie in (or not) via a layer of tcs:relationshipAssertion instances. The point is to explicitly link the data to the more granular (and explicit) TNU instances, then layer the "Taxonomic Concept" assertions on top of that.

I'm not suggesting that my approach is better -- just trying to articulate why I see costs and benefits to minting identifiers for TCs that transcend individual TNUs, vs. linking data directly to TNUs and allowing a separate layer to subjectively (and dynamically) bundle those TNUs.

nielsklazenga commented 5 years ago

@jgerbracht: I note that your are not watching the repository, which is why you missed the meeting invitation (#4).

nfranz commented 5 years ago

Thanks, @deepreef . Mostly yes. I think we as information managers have a special role to focus on and design for long-term, deep-time biological data integration. And that should be reflected in our notion of Taxonomic Concept. Other sections of biology are more focused on utilizing instant snapshots of the taxonomic data landscape and behave as if these can be taken to represent knowledge of nature, i.e. Taxa (example of that kind of behavior: https://www.jstor.org/stable/3496386). So I'm saying, let's not appropriate Taxon from that kind of important use; let's give those sections of biology the benefit to make confident snapshot knowledge claims, and be the complement to our more integration-focused representations. Even though the lines are not clear-cut, we actually need both manners of speaking. And for the TCS, we need to signal clearly where we information managers stand. Notice that this can be said without getting into more ontological issues (in this sense https://plato.stanford.edu/entries/logic-ontology/#Ont). Allowing Taxon to be what knowledge-claiming biologists need it to be at a given time, should not mean that the TCS has to make ontological claims regarding Taxa.

deepreef commented 5 years ago

Great! Thanks, @nfranz! I COMPLETELY agree with what you say above about the special role and focus. My query was really a semantic issue -- trying to confirm your meanings of the word "Taxon" vs. Taxonomic Concept". In that regard, I think we should all be consistent in referring to the word "Taxonomic Concept" as representing an asserted circumscription, and leave the unqualified term "Taxon" out of our conversations and documentation as much as possible (except when mapping to dwc:Taxon).

My only caution is that we can't assume that everyone in our audience will inherently understand the difference between "Taxa/Taxon" and "Taxonomic Concepts" the way you have distinguished them. I suspect that outside of this particular (TCS) context, many/most people (including me) regard them as loosely synonymous with each other. But I fully agree that within this TCS discussion context, we should adopt more precise (though sufficiently flexible) meanings to these terms and use them consistently. The same will apply to all sorts of "Name" related terms (e.g., the difference between a Taxonomic Name, Protonym, text-string/literal, etc.)

jgerbracht commented 5 years ago

@deepreef I agree completely, let's agree on a definition of Taxonomic Concept and try to leave Taxon out of our documentation. We have a hard enough time, I think, clearly communicating the difference between a Taxonomic Concept and a Taxon Name Usage. @nfranz I would, in principle, agree that keeping a definition of Taxonomic Concept as 'general' as possible is likely a good thing, unless it becomes so general that it remains a term used both for concepts and for name usages. I think that's one of the core reasons we're revisiting this, because the Taxonomic Concept in TCS wasn't clear enough and was open to broad interpretation.
I'll go back to an earlier post, can we pen and agree to some working definitions of these two? That will certainly help me think about these things.

Re the example I gave of 100+ TNUs mapped to a single TC id, i.e. using/modeling for TC ids vs TNUs and relationships. @deepreef brings up some real life issues that we need to tackle and those real life issues are EXTREMELY difficult to resolve and require someone intimately familiar with the taxonomies at handle. I THINK the underlying issue of changes in either the TNU to TC mapping as I proposed or the TNU to TNU mappings are the same. i.e. both approaches are "a subjective assertion that is prone to subsequent alteration/revision" and how the mapping is fixed or how the relationships are fixed is still the same problem. From this perspective, I don't see and advantage of one over the other, but I'm happy to be wrong.

deepreef commented 5 years ago

Thanks @jgerbracht -- I agree completely. The fundamental problem (and the reason we've never really solved this issue before) is because there are some extremely complex and subtle/nuanced relationships between organisms, names, and taxonomic relationships/classifications, and these complex issues have been further confounded by confused and inconsistent terms to describe some fundamental things.

As for Taxonomic Concepts and TNUs, I think the best way to characterize this goes back to Walter Berendsohn's notion of a "Potential Taxon" -- which in our terminology would be a "Potential Taxonomic Concept". A TNU represents the cloud of information and properties for how a particular Reference treated a particular Protonym (=Name-as-object). A reasonably well-defined subset of TNUs represent "Potential Taxonomic Concepts".

One of the key questions we need to figure out, with respect to the second paragraph of your post above, is whether it makes sense to collapse a set of TNUs representing confidently congruent Taxonomic Concept circumscriptions into a single "Taxonomic Concept Instance" with its own identifier and properties. I definitely think it's worth exploring, but it might make sense to first clearly define TNUs and the relationships among them; then figure out what a secondary layer of aggregated congruent TNUs into a single defined object instance. In this sense, it's important that TNUs are defined in such a way that they can be easily aggregated in this fashion, if it ends up making sense to do so.

rdmpage commented 5 years ago

Reading through these threads I keep trying to figure out what problems we are trying to solve? I confess that I struggle with abstractions that don’t readily translate into something that I could imagine using and/or building. I also find it helpful to have actual examples to focus on.

Looking at eBird as an example (and @jgerbracht can correct me if I’ve misunderstood) there seem to be several problems to tackle:

  1. How do we represent a given classification (e.g., bird classification for August 2017).
  2. How do we enable users of a taxonomy to refer to a particular taxonomy (e.g. August 2017)?
  3. If users refer to a taxon without reference to a particular taxonomy, how we we resolve that reference?
  4. How do we compute and represent the changes between the August 2017 and August 2018 taxonomies?

It seems to me that 1 is straightforward, we simply define a way to represent a tree. Many biodiversity informatics projects use trees (classifications) to help users navigate through data. Note that the tree could be explicitly defined (e.g., as a tree structure in a file) or implicitly (say, as a checklist in a paper).

2 is also straightforward if we have identifiers for classifications, and optionally some way of locating a node in a tree, again, either explicitly in a tree structure, or on a page in a published checklist. (I could see and obvious role for GBIF here in that you could publish a checklist on GBIF and use the resulting DOI to identify that taxonomy.) So I think what would be useful here is a convention for explicitly citing a given taxonomy (formalising “sec”). There is scope for exploring the best way to identify nodes in a tree (e.g., do we simply cite a node name and tree version, or do we have identifiers like eBirds that remain unchanged between trees if node is the “same”)

3 Is either trivial or difficult, depending on how you approach it. Given that the vast majority of references to taxa will be by name, we either accept the ambiguity and treat this as a effectively a search (find me every taxonomy with that name) or endeavour to work out what particular classifications a publication at a certain date may apply to (e.g., what versions of bird taxonomy were in use at that time?)

4 Is perhaps the most interesting topic, and we have seen at least two ways to think about this, either do pairwise mappings between nodes in the two trees, or compute edit operations between the two trees.

Given that we are having the discussion on GitHub it may come as no surprise that I view 4 as essentially versioning. If the 2017 tree was in GitHub, we could imagine editing it as each new paper on avian taxonomy comes out, then freezing the tree and releasing a new version in 2018. The “diff” between tree 2017 and tree 2018 defines the differences between the two trees.

So, I see three “products” that would be useful:

  1. standard for describing a classification
  2. a standard for citing a classification and/or location in a classification (I’m using “classification” so we can include both trees, networks, and publications)
  3. standards for describing relationships between trees (e.g., mappings and edit operations)

For me a really interesting test case would be to take, say, the August 2017 eBird classification, take all the taxonomic work between 2017 and 2018 (listed on the eBird cite), represent those works in terms of 2 and 4 above, that is, they reference the 2017 classification, and they describe the changes made (e.g., subspecies x is now a full species in a different genus if you think in terms of edit operations, or the equivalent set relationships if you think in terms of mapping), and see if we can then compute the August 2018 tree using just that information. This would mean we could have a way to describe taxonomic information that was computable and could be used to generate new classifications.

If taxonomic information was described in that way then it would seem that the goals of aggregators and taxonomists could be aligned: the aggregator’s task is easier because the data is well described in nice, computable, citable chunks, which means the taxonomist’s work gets quickly incorporated into the aggregation in a way that gives them credit and visibility.

nfranz commented 5 years ago

Going to point to this as an example of doing 4: https://doi.org/10.1093/sysbio/syw023.

rdmpage commented 5 years ago

@nfranz Thanks! Maybe we should assemble a set of relevant examples, such as the primate study you linked to, the eBird classifications, etc., and use those as test cases? For example, given the two MSW primate classifications an obvious question is how we can represent MSW2, MSW3, and the relationships between them using a simple vocabulary. Related to that goal, can we then link names and literature to those, so we could imagine giving someone a set of files and saying "here is the history of primate classification linked to all the relevant publications, enjoy!".

baskaufs commented 5 years ago

+1 for assembling use cases

nielsklazenga commented 5 years ago

+2

@deepreef It's probably best not to do this in the issues as all. I have created a folder 'use-cases'. Put them in there in any form you like. We can make them consistent filetype- and design-wise) later.

deepreef commented 5 years ago

+3 :-)

nfranz commented 5 years ago

I can provide links to these, if or as needed. tcs_use_cases

baskaufs commented 5 years ago

This is a response to @frmichel's comments on the pull request. @frmichel noted problems with the Darwin Core dwciri: terms and with Darwin-SW. Just to clarify about those two things: the DwC RDF Guide (which minted the dwciri: terms) recognized that there were problems with the taxon/taxon concept/TNU in Darwin Core, but did not consider "fixing" them to be within its scope. It simply provided guidance on how to use the existing DwC terms (or their dwciri: analogs) but did not generally suggest how to clarify their meaning or add any new terms that were missing. It assumed that some future group (like this one) would fix that problem.

Darwin-SW was not an TDWG effort, so it has no official standing in TDWG. It suggested a fix for the missing object properties needed to connect the Darwin Core classes, but also basically dodged the issue of clarifying taxon/taxon concept/TNU.

So really, neither of those two efforts should be looked at as a solution. As far as updating the TDWG Ontologies (TaxonConcept and TaxonName) is concerned, I think it would probably be better to just focus our efforts on incorporating the good parts of them into what we build here. Although those two ontologies don't have any official standing within TDWG either, they do reflect one attempt to translate an actual TDWG Standard (TCS 1.0) into the Linked Data/Semantic Web world, and should therefore have some weight in the discussion - particularly since some members of this group have experience trying to implement them. That's really useful information.

jgerbracht commented 5 years ago

@rdmpage Re "1. How do we represent a given classification (e.g., bird classification for August 2017).

  1. How do we enable users of a taxonomy to refer to a particular taxonomy (e.g. August 2017)?
  2. If users refer to a taxon without reference to a particular taxonomy, how we we resolve that reference?
  3. How do we compute and represent the changes between the August 2017 and August 2018 taxonomies?"

I would add a 5th one. How do we track a particular taxonomic concept through time/taxonomies.
This cannot be done by computing changes between the two taxonomies, that approach would accurately cover a number of taxonomic changes between version and it critical to have in our tool set. However, there are also a variety of taxonomic changes that cannot be computed and must be done by the taxonomy experts.

rdmpage commented 5 years ago

@jgerbracht

In a sense it seems to be solved for eBird by the use of stable identifiers between classifications (e.g., radshe1, although it's not clear what rules are used to carry those identifiers across trees. But yes, the success of comparing trees to computing changes does depend on how well labelled the trees are.

However, there are also a variety of taxonomic changes that cannot be computed and must be done by the taxonomy experts.

Can you give an example? I'm not sure that there are things which can't be computed, I suspect it's more a question of whether the changes made (and/or the reasons) are represented with enough precision to be easily converted into something a computer can handle. Taxonomy is a pretty simple affair in many ways, we have sets, we have notions of relationships among those sets, and we have collections of labels to be assigned to those sets. I think it's eminently computable.

jgerbracht commented 5 years ago

The tracking of taxonomic changes I'm referring to is the tracking of concepts and in cases where concepts are added or removed, the taxonomist is the one who knows the path from taxonomy 2017 to 2018 and to retroactively calculate those paths using only the starting and ending taxonomies is currently problematic at best. I agree completely with your statement that it's "more a question of whether the changes made (and/or the reasons) are represented with enough precision to be easily converted into something a computer can handle", and is something we can and should strive to help the taxonomic communities where we can (though that's certainly a very different but interesting topic for another day). I was referring to the status of taxonomies today, which do not provide those necessary details (Clement's comes close).

nfranz commented 5 years ago

Hi @jgerbracht. Yes, this is why - as I suggested here https://github.com/tdwg/tnc/issues/1#issuecomment-419473701 - it will be hard to come to an agreement about the scope of TCS2 without resolving at least these two issues upfront:

  1. To what extent does the TCS2 effort not only aspire to model mainstream systematic practice - "what most systematists tended to do or tend to do now" - but also channel or even syntactically enforce an evolution in systematic practice. In other words, does TCS2 have mostly just representative, or maybe also normative (rule setting) aspirations towards systematics? Even more bluntly, is TDWG prepared to get into systematists' grill? I believe we can live with a "yes" or "no" better than with a "maybe, we'll see".

  2. Even if the answer is "yes", what exactly is the role of a standard in this context? My own sense is that expectations towards a standard are too high among some of us. I believe the right level of expectation is that a standard should be designed to facilitate a fairly wide range of practices, ranging from ideal to very real. The challenge here for DwC is that it cannot actually represent something as Taxonomic Concept/Relationship heavy as this: https://doi.org/10.3233/SW-160220. DwC fails to provide the minimally needed syntactic structure for this kind of multi-taxonomy alignment work, and offering such a structure - where the data reflect it - is one function of TCS2. But it would be too much to ask of a standard to be much more than allowing the ideal, and also asking of the standard to enforce the ideal at all times. I've suggested previously that I believe that is mostly the role of specific implementations and communities.

In summary, I think the way to resolve discussions about scope is to first agree on any normative aspirations of TCS2, i.e., whether we are putting this out partly also to help make future systematics practice better, somewhat regardless of the field's legacy. We have sufficient use cases to indicate that "better" is feasible. But must acknowledge that it remains rare today. [Having done many hundreds of RCC-5 alignments myself, I believe that this is more limited by current incentive structures than by the nature of the data. But that is not so relevant for us now.] Then we need to decide how much of that "making it better" must be allowed by TCS2, versus how much of that must be enforced by it (as opposed to being enforced by TCS2-utilizing implementations).

rdmpage commented 5 years ago

@nfranz It's not clear to me who TCS2 is for, or at least, there seem to be multiple possible audiences, and I'm not sure taxonomists are likely the be either the biggest nor the most important.

Indeed, playing devils advocate, I'm not entirely convinced there is even a need for TCS2, given that taxonomists, biodiversity informatics projects, and genomics databases (e.g., NCBI) seem pretty happy to pump out taxonomies and lists of names without any vocabularies at all! In other words, it's not clear that people are banging on TDWG's door saying "we can't do our science without TCS2". One can certainly make a case that things could be done better if we had a better way of representing taxonomic information, but what we have at the moment seems to work OK for most purposes.

So I wonder if it would be helpful to have some notion of who the users are, both of TCS2, and of products that use TCS2. At the moment much of the focus seems to be on database builders who:

Now, there is certainly a case that working taxonomists could make their work more accessible to machines by marking up their work, and providing easy means to do that would be a great TCS2 use case, although the vast majority of taxonomic work is not published in journals that support any kind of mark up. Likewise, being able to provide TCS2-enabled things that taxonomists would find useful would be great (e.g., for any taxon give a summary of the current and past taxonomies, a complete bibliography - linked to digitial versions where possible, a list of relevant specimens, especially types, essentially a "project in a box").

So I think in part any expectation of what a standard can achieve depends on who you think it is for. I don't think taxonomists care at all about 99.9% of what TDWG does, they will care about anything which makes their life easier, and which helps increase the visibility of their work. I think the people who care about TCS2 will be mostly much limited to those dealing with large chunks of data, either publishing it, aggregating it, or both.

mdoering commented 5 years ago

thanks @rdmpage, fully agree. And I can give you at least a very concrete request from the CoL+ project which seeks a new standard to share nomenclatural and taxonomic data in CSV files. DwC-A has various issues, TCS XML is actually quite alright but hard to work with, the TDWG ontology is even harder yet.

I would love to see something compatible with datapackages which could replace your custom dwc archives and free us from the "star" restriction

nfranz commented 5 years ago

Thanks, @rdmpage. When Jessie Kennedy led the TCS1 effort, the scope of users was inclusive; see: http://seek.ecoinformatics.org/attach%3Fpage=ScienceTaxon_12_May_2004%252FWhy_do_we_need_a_taxonomic_concept_transfer.ppt (particularly slides 6-8).

And the primary underlying motivation for TCS1 was the systemic inability of name-based systems to be taxonomically precise enough: https://www.napier.ac.uk/~/media/worktribe/output-255552/scientific-names-are-ambiguous-as-identifiers-for-biological-taxa-their-context-and.pdf

Also echoed here: https://www.researchgate.net/publication/6886479_A_Standard_Data_Model_Representation_for_Taxonomic_Information

I vote for preserving that still very much valuable problem diagnosis legacy of the 2005 TDWG-ratified TCS1. The primary purpose was and still is to do name/relationship management as as well as possible, and do better where possible with TCS2-facilitated syntax.

In that context, I think the right long-term strategy is to be more engaging towards the systematic expert community. Jessie Kennedy's history with TDWG and TCS1 possibly began with this, http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.50.2436&rep=rep1&type=pdf, which is in line with the trajectory of supporting expert systematic workflows.

TCS2 can be viewed as an opportunity to bring TDWG and the systematic research community closer together.

deepreef commented 5 years ago

@nfranz

Even more bluntly, is TDWG prepared to get into systematists' grill? I believe we can live with a "yes" or "no" better than with a "maybe, we'll see"."

I vote a resounding "no", in the context of TCS2.

whether we are putting this out partly also to help make future systematics practice better, somewhat regardless of the field's legacy.

It's not too hard to make the standard accommodate the future/ideal (when the information is available), without drowning out an effective basic mechanism for capturing what we can capture from less-than-ideal legacy sources. The enforced components should be kept to a minimum.

@rdmpage

but what we have at the moment seems to work OK for most purposes.

Hmm... what we have at the moment doesn't allow anyone to filter GBIF data on all taxa identified as X or identified as something regarded by authority Y as being a synonym of taxon X (to take an extremely over-simplified example use case that I think "most" users would like to be able to do). I think the role of TCS2 should be to allow us to capture taxonomic metadata associated with biological datasets in a way that enables automated data-enrichment through various online services (e.g., CoL+). In short, what we should be aiming for is allowing a non-taxonomist user-base to get the answers they want/need without having taxonomic expertise themselves. The status quo definitely does NOT allow this (it only makes people think they have it because they are only using text strings to represent scientific names, and are blissfully unaware of how anemic the results sets are because of it).

jgerbracht commented 5 years ago

@deepreef Well said "I think the role of TCS2 should be to allow us to capture taxonomic metadata associated with biological datasets in a way that enables automated data-enrichment through various online services (e.g., CoL+). In short, what we should be aiming for is allowing a non-taxonomist user-base to get the answers they want/need without having taxonomic expertise themselves. The status quo definitely does NOT allow this (it only makes people think they have it because they are only using text strings to represent scientific names, and are blissfully unaware of how anemic the results sets are because of it)."

@nfranz "channel or even syntactically enforce an evolution in systematic practice" I agree that TCS2 should force systematic practice, though if TCS2 is done well, the standards and eventually, the tools will be available to enable the evolution of systematic practice in regards to how one thinks about and manages taxon concepts.

nfranz commented 5 years ago

@deepreef writes: "In short, what we should be aiming for is allowing a non-taxonomist user-base to get the answers they want/need without having taxonomic expertise themselves."

But, is that not very often the happy secondary effect or by-product of this more primary cause? Expert systematists have been enabled (TCS2 design), empowered (decentralization => implementation design), and incentivized (accreditation => implementation design) to transfer our knowledge via TCS2 syntax into aggregating environments. In a world where most scientists operate within a merit-based framework, how can a non-expert user base benefit lastingly if the expert contributor base does not benefit first or foremost?

deepreef commented 5 years ago

Did he die because his brain went hypoxic? Or because his lungs were full of water (causing his brain to go hypoxic)? Or because he went unconscious underwater (causing his lungs to fill with water)? Or because he had a seizure (causing him to go unconscious)? Or because he was breathing too much oxygen under pressure (causing him to have a seizure)? Or because his rebreather provided too much oxygen (causing him to breathe to much oxygen)? Or because he set up the rebreather incorrectly (causing it to provide too much oxygen)? Or because it was a bad rebreather design (making it too easy for him to set it up incorrectly)? Why did he die?

Sorry for that weird/morbid analogy, but it sounds like we're making the same point at slightly different levels. My statement about what we should be aiming for isn't the "secondary effect" (happy or otherwise), it's what I see as the terminal goal (within the scope of TCS2). There are many things that need to happen in order to achieve that terminal goal. Certainly among them are steps that enable, empower, and incentivize scientists to to play their role in extracting and synthesizing information from raw data (occurrence records, literature information, etc.) and transforming it in a way (TCS2) that serves a function to non-scientists (or scientists lacking specific expertise). The point has been made many times over many years that if all we achieve with TCS is the goal of allowing taxonomists easier access to data to help them achieve their taxonomic goals, then we have failed. We certainly do need to do that, but in a way that facilitates something useful to a much broader audience.

vsenderov commented 5 years ago

I realize the issue has been closed but I would like to nevertheless answer the questions @deepreef raised on Sep. 8. I apologize for the late reply but other commitments prevented me from writing a detailed response. I am copying Lyubo's new PhD student Maria (mdimitrova095 at gmail.com) as well, as she is slowly transitioning to maintaining the pioneering biodiversity knowledge graph OpenBiodiv.

Many thanks for re-linking this publication, @nfranz! I thought I had clicked on your original link, but evidently not as this is the first I'm seeing the full publication. Although I do have some minor philosophical quibbles (e.g., I still fail to understand how a taxon concept can justifiably be called a "hypothesis", rather than an asserted opinion -- I don't agree with the arguments put forth about falsifiability), once I got past those I found the article to be very useful in framing the problem we're up against with this discussion. It's definitely worth carefully reading by anyone interested in this sort of stuff.

If a taxonomic concept is an unfalsiable opinion, it must logically follow that taxonomic circumscription does not follow the scientific process. If you want the taxonomic process to contend to describe the real-world in a Popperian fashion, then it is necessary that the opinion can be checked against some form of experiment. In the case of taxonomic concepts, a single taxonomic concept can be checked as to whether or not it follows some species concept.

I do have a couple of technical questions that are most likely due to my ignorance of OpenData, (SPAR Ontologies, etc.; but I'm going to take a risk and ask them anyway. Perhaps you can help clarify these.

Please, feel free to get back to me per email or Skype whenever you wish---I am more than willing to discuss this should this explanation fall short.

The article states that "Taxonomic Article is a subclass of FaBiO’s Journal Article". However, several other subclasses of FaBiO's Expression class (e.g., books, chapters,, etc.) also contain taxonomic treatments. Is this a problem for implementation, or are we only interested in treatments that appear in articles, or...?

Neither. While Taxonomic Article is a subclass of Journal Article, a Treatment is a subclass of Discourse Element. From the guide:

:Treatment a owl:Class ;
  rdfs:subClassOf deo:DiscourseElement ,
                  [ rdf:type owl:Restriction ;
                    owl:onProperty :isContainedBy ;
                    owl:someValuesFrom :TaxonomicArticle ] ;
  rdfs:label "Taxonomic Treatment"@en ;
  rdfs:comment "A rhetorical element of a taxonomic publication, where taxon
                circumscription takes place."@en ;
  rdfs:comment "Таксономично пояснение или само Пояснение е риторчна част
                от таксономичната статия, където се случва описанието
                на дадена таксономична концепция."@bg .

The above code is in OWL. Without going into too much detail it is the standard way Peroni and Shotton deal with discourse elements such as special sections in the article (e.g. Introduction, Methods, Discussion, etc.).

The article states "In OpenBiodiv-O, a taxonomic name usage is the mentioning of a taxonomic name in the text, optionally followed by a taxonomic status." If a name is mentioned several times within a single treatment, does that represent more than one TNU sensu OpenBiodiv-O?

Yes. Each text area is a single TNU with a unique identifier. This is modelled after the Mention class of the base ontology PROTON Extensions module.

Or are they collectively contained within a signe TNU (e.g., represented by the NomenclatureHeading)?

No.

it seems that the TNU is the raw text string, not the Treatment as a whole, in which case the definition of TNU as asserted in the context of OpenBiodiv-O is a significant departure from how it has been defined elsewhere.

Possibly. However, in the broader Natural Language Processing (NLP) community, this is how "mentions" of particular entities are modeled. E.g. if I have text about Germany, I will have in it a) the concept of the Germany (with a URI, say http://dbpedia.org/page/Germany); b) text areas that mention Germany. Note that the strings of these text areas might be slightly different due to grammatical and semantic considerations. The NLP task is to map these mentions to the dbpedia:Germany. In our case we link particular text areas to URI's of taxonomic names. Note that as names are different from concepts, there is yet another mapping from a name to URI. Thus, should I adopt a yet another layer of indirection for TNU's I risk to make the model too complicated. Therefore, I have strived for the most parsimonious model and defined Mention as it is defined in the NLP world. Here is the definition of the superclass from PROTON: "An area of a document that can be considered a mention of something."

An important aspect of TNUs is that there is generally a 1:1 correspondence between a Treatment and the TNU representing the NomenclatureHeading for the Treatment.

In a system, where there is a bijective mapping between Treatment and TNU, one of these two classes is extraneous. This is not the case in OpenBiodiv-O as it tries to provide only way to express any given statement.

However, as implied by Figure 1 of the article, a treatment often contains other TNUs (e.g. within the NomenclatureCitationList). Thus, while every Treatment has exactly one corresponding TNU, not all TNUs are treatments.

True. Treatments are specialized discourse elements. Treamtents are expressions of the more abstract class class concept. Think of this like this: a treatment is the "writing down" of the idea that the concept represents. In order to fully appreciate this, please refer to page 6 of the FRBR model.

I very-much like the way that "TaxonomicConceptLabel" (TCL) is defined.

Thanks. This is @taxonbytes idea.

However, I'm not entirely sure I understand why the need for establishing OperationalTaxonomicUnit as a super class of TaxonomicConcept. In my mind, Taxonomic Concepts represent a circumscription of organisms, regardless of whether that circumscription happens to include a specimen (or more than one specimen, when heterotypic synonymy is involved) designated as a name-bearing type for a Linnean-style taxonomic name (i.e., regardless of whether the concept has a formal scientific name to label it with). Can you provide examples of instances of OperationalTaxonomicUnit that would not be regarded as instances of TaxonomicConcept? I.e., what other subclasses of OperationalTaxonomicUnit are there, and what function do they serve?

This is a point of modeling and different ways to do this are possible without sacrificing expressivity. My idea was, however, to make taxonomic concepts the biodiversity-grouping concepts that are formed by taxonomists and that can be identified with taxonomic concept labels (Aus bus sec. X). Clearly, one may form a biodiversity-grouping concept in a non-traditional way: e.g. a BOLD BIN would be an example of that. Such a "taxonomic concept" will not have, at least initially, a taxonomic concept label. However, The BOLD BIN is clearly a falsifiable hypothesis about a unit of biodiversity. In a different example, may I bring up my current work on an entirely new system of grouping organisms on the basis of integrative information and Deep Neural Networks. The biodiversity operational units that BOLD or my system form will be biodiversity-grouping concepts, as well. In order to distinguish such circumscription from the more traditional Linnean one, I have restricted taxonomic concept to denote the set of biodiversity-grouping concepts that can be formed with traditional means, and relaxed operational taxonomic unit to denote the set of all concepts about units of biodiversity. Note, I could have used the clunky biodiveristy-grouping concept that I am using in this paragraphs, but I decided to defer to Sokal and use the established term OTU, which has already been used for numerical circumscriptions and will not suffer by this extension.

Regarding the two patterns, replacement name and related name, is the former a susbset of the latter?

Replacement name and related name are properties, i.e. binary relations:

:relatedName rdf:type owl:ObjectProperty, owl:TransitiveProperty, owl:ReflexiveProperty ;
  rdfs:label "has related name"@en ;
  rdfs:domain :TaxonomicName ;
  rdfs:range :TaxonomicName ;
  rdfs:comment "'has related name' is an object property that we
    use in order to indicate that two taxonomic names are related somehow. This
    relationship is purposely vague as to encompass all situations where two
    taxonomic names co-occur in a text. It is transitive and reflexive."@en.
:replacementName rdf:type owl:ObjectProperty ,
                          owl:TransitiveProperty ;
  rdfs:label "has replacement name"@en ;
  rdfs:domain :LatinName ;
  rdfs:range :LatinName ;
  rdfs:comment "This is a uni-directional property. Its meaning
    is that one Linnaean name links to a different Linnaean name via the
    usage of this property, then the object name is more accurate and should be
    preferred given the information that system currently holds. This property is only
    defined for Linnaean names."@en.

It is a little hard for me to parse "replacement name is a subset of related name." Neither of these two objects are sets: they are binary relations. What is true, though, is

a) related name is a reflexive property. I.e. if related_name(A,B) holds, so does related_name(B,A) b) replacent name is not. The idea of replacement name is to follow the chain of replacement names to the currently valid name. In my dissertation, Section 3.4.2---Comptency question answering, I show how one can do these types of "validation queries" in the pioneering biodiveristy knowledge graph, OpenBiodiv. c) I have written additional rules (not part of the ontology) but part of the dissertation (Section 3.5.4---Post-processing) that say that if a name A replaces name B, so then related_name(A,B) holds, and necessarily due to reflexivity related_name(B,A). Thus, for any set of names $A_1$, $A_2$, $A_3$, ..., so that replacement_name(A_1, A_2), replacement_name(A_2, A_3), and so on, there exists a related_name relations between any two names of the set. The inverse is not necessarily true.

Or are these mutually exclusive?

No. One implies the other (not in the ontology but in the extension), but not the inverse.

It seems that replacement name implies congruence of concept/circumscription, whereas related name could apply to all five RCC-5 relations (or only the other four, excluding congruence), or...?

Both of these relations are weak and underdetermined as they describe relationships between names that are unsuitable proxies for taxonomic concepts. They may imply something about the taxonomic concept aligments, but mostly they only imply nomenclatural statements. @taxonbytes has done some logic (Franz, Nico M., Chao Zhang, and Joohyung Lee. "A logic approach to modelling nomenclatural change." Cladistics 34.3 (2018): 336-357.) to model how one can be deduced from the other.

Sorry for the long post -- just trying to make sure I understand the contents of and assertions in the paper correctly.

Sorry as well for the long. This stuff is very hard to describe formally but there is no way around it if you want to make a computer systems that reasons about it.

deepreef commented 5 years ago

Thanks, @vsenderov! I will reply via email to the CC list. If anyone following the GitHub thread is interested in this discussion, please let me know and I'll forward my reply to you.

valtermedeiros commented 3 months ago

This aged well 😉✅