tdwg / tnc

Taxonomic Names and Concepts Interest Group
22 stars 7 forks source link

Should taxonomicName be represented as a Subclass of taxonomicNameUsage #57

Closed deepreef closed 3 years ago

deepreef commented 4 years ago

In writing up my comments on the draft terms Google Doc describing my experiences exporting content from the GNUB database (4,835 taxonomic names contained within Linnaeus 1758 Systema Naturae; file uploaded as a TSV), I realized that the apparent conflict between treating taxonomicName as a distinct Class from taxonomicNameUsage in TNC, and the need (at least from GNUB) to re-use unique identifiers to identify instances of both taxonomicNameUsage and instances of taxonomicName (i.e., Protonyms int he context of GNUB) might be solved if the class taxonomicName is cast as a Subclass of taxonomicNameUsage.

I don't understand the ontological/implementation consequences of framing it this way, but based on my (limited) understanding of this stuff, that might be the best path to reconciling the problems I have.

jliljeblad commented 4 years ago

Technically, I believe they are different classes since the same name can be used differently by different authors. I practice, however, it seems easier to treat them together. If you end up duplicating the name because it has multiple usages (which is how you usually find them in checklists), it is easier to state (with a name relation) that the two [name] instances are identical. You still need a way to keep track of each set of congruent usages with a stable identifier, but that is another issue.

Not sure if this really relates to what you are saying since it is a bit abstract. Could you present your case with an example, maybe?

deepreef commented 4 years ago

Thanks, @jliljeblad -- but I'm not sure I understand the logic.

the same name can be used differently by different authors

I'm not sure what you mean by "used differently" here. For example:

TNU1: Aus bus L. sec. Smith TNU2: Aus bus L. sec. Jones

Both TNUs reference the "same" Aus bus L., so I'm not sure how this particular relationship (i,e., TaxonomicNameUsage-->TaxonomicName is more correct when they are different Classes, instead of different Subclasses of the same parent Class. But maybe (likely) I don't fully grasp the implications of relationships between instances of Subclasses and instances within their parent Class (or vice-versa).

Certainly, the exact orthography may be different, in the sense of:

TNU1: Aus bus L. sec. Smith TNU2: Aus buus L. sec. Jones [misspelling]

But we already have verbatimNameString within the TNU Class. But if we do go this route, then perhaps some of the properties of the TN Class might need to be "promoted" ("demoted"?) to the TNU Class.

You still need a way to keep track of each set of congruent usages with a stable identifier.

My understanding is that this is what the TaxonRelationshipAssertion Class (and the identifiers of instances therein) is for.

I gave some thin hypothetical examples above, but if useful, I can provide more robust real-world examples.

jliljeblad commented 4 years ago

It's probably my understanding of Classes vs. Subclasses that is the problem here. As far as I can tell, we agree on the usages. I wasn't referring to orthographic variants.

Think I'll just go read up on the definition on Classes vs. Subclasses and the implications of that.

deepreef commented 4 years ago

Think I'll just go read up on the definition on Classes vs. Subclasses and the implications of tha

I need to do the same! :-)

cboelling commented 4 years ago

I'm still catching up with the material provided in this group and related efforts, so I don't dare to offer an opinion on the specific problem (yet). Regarding the subclass relation my general understanding is this:

A is a subclass of B if and only if it follows from c is an instance of A that c is an instance of B.

So, are any and all instances of TaxonomicName (sensu TNC) also instances of TaxonomicNameUsage (sensu TNC)? It sounds doubtful to me, but then again I think have not yet fully ingested all the current thinking that stands behind the terms "TaxonomicName" and "TaxonomicNameUsage".

nielsklazenga commented 4 years ago

Thanks @cboelling , that certainly agrees with my thinking.

jliljeblad commented 4 years ago

Oh, thanks. This makes it much clearer to me. Actually, you could argue that with every TaxonomicName comes a usage, however vague or indirect. Every instance of a TaxonomicName comes in a context, hence an implied TaxonomicNameUsage even though the actual definition of the usage would range from explicit and clear to only that which one can deduce from the context.

With that, I would argue that every instance of TaxonomicName is an instance of TaxonomicNameUsage. But I might have missed something.

nielsklazenga commented 4 years ago

@jliljeblad, you could definitely argue that and it is a valid argument. I see TaxonName more as a subclass of skosxl:Label (#25, tdwg/tag#22).

cboelling commented 4 years ago

It might be useful to distinguish - at least for the sake of wrapping your head around this - between the string of characters "Aus bus L." and its use as a name for a taxon (whatever that is). Making this distinction, the view articulated by @jliljeblad

that with every TaxonomicName comes a usage, however vague or indirect

makes sense to me: any instance of referring to a taxon by a given string of characters constitutes a usage of that string of characters as a taxon name (where the different parts of that string of characters acquire the semantics that comes with binomial nomenclature). On this view, rather than being a subclass of TaxonomicNameUsage, TaxonomicName is more like a role of a given string of characters in a given TaxonomicNameUsage.

I am uncertain, though, (due to my limited current understanding) if that captures the actual intent of these concepts in the TNC framework.

jar398 commented 4 years ago

This doesn't make any sense. Just because every A has a B doesn't mean that every A is a B. Every use of the word "house" is a use in some context. That doesn't mean that every house is a use of a house.

Or, you could try to say that every personal name, e.g. "Bob", gets used, therefore every personal name "Bob" is a use of a name. No reasonable person would talk this way, it just isn't natural.

Or look at it this way. Let B be the class of uses of names, and let A be the class of names. Let x = "George Washington", which is in class A, and let y = the use of "George Washington" to name the first US president. x and y are clearly different entities since y is specific to a particular person while x is not. (Things with different properties are different things.) In fact, this is true for all members of A and B - every entity (usage) in B has to do with something specific to that entity, while nothing in A is specific to any entity (except by accident, and transiently, if the community is only using the name in one way at the present time). So A and B are disjoint.

jgerbracht commented 4 years ago

I think I agree with Jonathan on this. I don't see one of these as a subclass of the other. In it's simplest sense, I think of a taxonomic name as a label along with metadata around who/when, etc. first coined that label. I think of a taxonomic name usage as the application of that label to a set of organisms according to some authority. The initial 'coining' of a label (TN) includes some description of the set of organisms the label is initially applied to, so that when a new TN is coined, a new TNU is also implied. However, I see these two things as being different and not have a class/subclass relationship. Also, on a different note, I have a feeling that if we treat the TN as a subclass of TNU, that will make the distinction between the two even harder to clearly convey, know that we sometimes struggle with that ourselves.

That said, I'd like to hear more details on Richards practical experience and the issues he's found which could be resolved by subclassing.

mdoering commented 4 years ago

I agree subclassing feels wrong. If I understand @deepreef from previous conversations it is hard though to put a nail on the thing called name. It seems to only ever exist together with a usage. If we have several usages linked to the same name what is it they all share? They can even have orthographic variations. You can argue it is simply the original usage. And in that case we might as well drop TN and only deal with usages...

ghwhitbread commented 4 years ago

@deepreef

We had this same issue with TCS101. Because we had modeled our system around events resulting in the occurrence of taxonomic names in the literature (s.l.), casting data to names or concepts as appropriate, all of our TaxonName objects were also instances of the TaxonConcepts class. Beyond duplication, this was not such a serious issue for us when turning the data out as TCS because we were not using UUIDs. Name/54321 and Concept/54321 do not raise as much suspicion as name/UUID-000 and taxon/UUID-000, especially given aggregators' inclination to reuse them. Even so, we did normalize that name-author-date string thing.

Though we might treat the taxonomicName object as an instance of the taxonomicNameUsage (TNU) class, or implement sub-classes of TNU derived from the potential for null property values in the class: nomenclatural novelty (taxonomicNames); subsequent treatment (concepts); relationship (synonyms, variants,…, and misapplications), assertion - I do think that the standard needs to be about the interchange of data objects in a way that satisfies the majority of use cases rather than the adoption of a particular logical model. Given one such use case is a simple list of names, better off with that, and a linked generic TNU object with extensible type vocabulary.

Having made the compromise to cast data into TCS2 my primary concern will be … can I get it out again.

deepreef commented 4 years ago

Thanks everyone. I knew this was a non-trivial issue, and fully expected it to draw some commentary, so thank you (everyone) for engaging.

As @ghwhitbread notes, this issue goes back to TCS101 (and earlier). And, I think he also makes the most compelling rationale for maintaining TaxonomicNameUsage and taxonomicName as distinct Classes (i.e., not the latter as a Subclass of the former). More specifically: because this is an exchange standard, not a logical data model, it needs to actually work in the context of the majority of existing data. I've already alluded to this in my comments on Issue #53. (See my fingernail-chalkboard comments).

So, here is the fundamental tension: I believe I can make a very compelling case that "taxonomicName as a Subclass of TaxonomicNameUsage" much more correctly reflects how taxonomic information actually exists in literature/taxonomic practice. @jliljeblad already touched on this with:

you could argue that with every TaxonomicName comes a usage

I have been making this exact point going back well before the original TCS discussions began, and have vivid recollections of Jessie Kennedy and @nfranz and I and others discussing this for hours at various TDWG meetings and other venues. The basic argument is that taxonomic names do not exist under rocks or on coral reefs or in trees or floating around in the atmosphere or anywhere else outside of TaxonomicNameUsage instances (and in databases and in the minds of taxonomists and others).

However, just because TaxonomicName instances are born in and exist almost exclusively within instances of TaxonomicNameUsage instances, doesn't mean that they are TNUs. And even if they are, that doesn't mean this is reflected in actual digitized data content extant in databases around the world.

I will follow this post with two more, one focusing on the conceptual aspects of this issue, and one focused on the practical aspects.

deepreef commented 4 years ago

OK, let me see if I can capture the conceptual framework for this issue as concisely as possible. @mdoering touched on it here:

it is hard though to put a nail on the thing called name. It seems to only ever exist together with a usage. If we have several usages linked to the same name what is it they all share? They can even have orthographic variations. You can argue it is simply the original usage. And in that case we might as well drop TN and only deal with usages...

I think we all more or less have converged on the notion of what a TaxonomicNameUsage is, and what instances of that Class represent (at least I hope so).

I think we also all understand that the simple text string of characters that forms the actual name itself is a literal and what we mean by TaxonomicName is a conceptual object with myriad properties, only one (or several) of which are text-string literals representing the full name itself (or various parsed components of the full text-string name).

So what we're trying to conceptualize is some sort of data "Object" represented by the Class TaxonomicName, and that this Object has properties beyond just the literal text-string label. But as @mdoering alludes to, we don't really have a clear idea of what one of these TaxonomicName Objects is, and when two different instances represent the "same" object, or slightly different objects. For example:

Is it possible that the same instance of TaxonomicName has more than one literal text string associated with it? Is "A. bus L. 1758" the same TN instance as "Aus bus L. 1758"? Or are they different instances? Or does it depend on context? What about "Aus bus L. 1758" and "Aus buus L. 1758"? What about "Aus bus L. 1758" and "Aus bus Linneaus 1758"? What about "Aus bus subsp. xus" and "Aus bus xus"? What about "Aus bus" and "Aus (Xus) bus"? There are many, many, many examples of edge-cases where reasonable people might come to different conclusions about whether two different records represent the "same" instance of a TN (i.e., linking to the same unique identifier), or different instances.

A similar problem exists with the question, is it possible that two different instances of TaxonomicName can be represented by the same literal text string (inclusive of authorship)?

These kinds of questions and discussions sound a LOT like the discussion we recently had regarding whether a TNU "is" (or, more correctly, can serve as a functional proxy for) a Taxon Concept/Implied Taxon Circumscription?

This is almost the exact same question, framed as "Can a TNU serve as a functional Proxy for a TaxonomicName?" If the answer is "Yes", then we might want to think seriously about collapsing TaxonomicName into a Subclass of TaxonomicNameusage -- in the same way that we talked about the idea of "TaxonomicConcept" being effectively represented by a subset (Subclass?) of TNUs that serve as functional proxies for Taxonomic Concepts/Circumscriptions.

But if the answer is "No", then we really need to have a much more robust and objective definition for what a TaxonomicName is, and how it is distinct and separate from a TaxonomicNameUsage. We can't fall back on the Pornography Principle that we know it when we see it, otherwise we've failed to make meaningful progress.

I have more to say on this, but need to go to a meeting now. I'll leave the 'Conceptual" part of this there, and later today I'll post again with some more details on the practical part of this.

camwebb commented 4 years ago

Fascinating conversation!

I think we need to keep the conceptualizations and definitions of TaxonomicName and TaxonomicNameUsage clearly separate. I.e., to the question:

“Can a TNU serve as a functional Proxy for a TaxonomicName?”

I’d say “no”.

I think we have three options:

One. The most radical is to discard TaxonomicName as an object and use names only as literals. This would mean that the name (literal) “Aus bus L. 1758” is not the same as “A. bus L. 1758”. And we would lose the ability to say things about the name itself, losing all the properties of TaxonomicName in our Terms doc (unless we allow Literals to be RDF subjects). This would however simplify greatly the process of encoding biodiversity data. At the cost of nomenclatural information capability.

Two. We could reify the literal name string. Two TaxonomicName objects with exactly the same literal name strings would be owl:sameAs each other. They would exist outside any history of usage of the name or even taxonomic context. However, some of our TaxonomicName properties might still be valid - those for which the result could be deduced from the name string itself. E.g., maybe

:TN1 a tnc:TaxonomicName ;
  litre:hasLiteralValue "Aus bus L. 1758" ;
  tnc:specificEpithet [
    a tnc:TaxonomicName ;
    litre:hasLiteralValue "bus" 
  ] .

(litre: see here for a proposal to formalize the reification of literals.) Other TaxonomicName properties that relate to the history of human taxonomic nomenclature (basionym, nomenclaturalCode, ...) would not be meaningful.

Three. We try hard to define exactly what a TaxonomicName is when shorn of it usages. I think we can do this. The definition should be anchored in the abstract process of combination of elements of a name: genus, specific epithet, author of the name (but not any particular spelling of the author’s name), date, etc., but without invoking any particular usage by the original author or other users. We do all (I guess) have this concept in our heads - it’s just a matter of carefully defining it. We didn’t have this definition problem previously, e.g., when the TaxonomicName properties were drafted in the Terms doc, but it’s really good to think about now (acknowledging that many of you have been over this ground many times already!).

The question of is TaxonomicName “A. bus L. 1758” the same as “Aus bus L. 1758” and “Aus buus L. 1758” is a tricky one. I think we might define TaxonomicName such that variation due to “unambiguous, collapsible elements” implies an owl:SameAs relationship, but “hard variation” requires a owl:differentFrom relationship. I.e.:

:TNU1 [ tnc:verbatimNameString "Aus bus L. 1758" ;
  tnc:taxonomicName :TN1 ] .
:TNU2 [ tnc:verbatimNameString "A. bus L. 1758" ;
  tnc:taxonomicName :TN2 ] .
:TNU3 [ tnc:verbatimNameString "Aus buus L. 1758" ;
  tnc:taxonomicName :TN3 ] .
:TN1 owl:sameAs :TN2 .
:TN1 owl:differentFrom :TN3 .

I.e., :TN3 is a “real” TaxonomicName, but just not the same as the name minted by Linnaeus. One reason we need a TaxonomicName class is to be able to make statements about the likelihood that two different usages which vary in spelling are actually referring to the same name instance.

However, just because TaxonomicName instances are born in and exist almost exclusively within instances of TaxonomicNameUsage instances, doesn't mean that they are TNUs.

This just may be an insoluble problem! But as most have been saying, just because there may be no way to talk about (including providing a GUID for) a TaxonomicName without creating an associated TaxonomicNameUsage, this does not mean that TaxonomicName and TaxonomicNameUsage are the same.

nielsklazenga commented 4 years ago

I think we should seriously reconsider the name for the term TaxonomicNameUsage (and also the use of 'taxonomic' rather than 'taxon'), as I think therein lies the problem. People seem to be too focused on the semantics of the label, rather than what the object should represent, namely some sort of treatment of a taxon (or taxonomic group if you like). It is discussions like these that make me want TaxonConcept back.

mdoering commented 4 years ago

Thanks @camwebb. A few remarks since I am fighting the "name identity" question for a long time.

We are primarily designing an exchange standard. It might well be that the definition of the uniqueness of a TN instance is outside the standard and up to its users. On the other hand different ideas of what a unique TN is bears serious problems for data integration and reuse. For example IPNI recognizes the same literal name by the same author in the same year several times because it was published in different journals. They get different IPNI identifiers, for example Pedicularis inconspicua. I still hope we can have at least clear recommendations how to deal with unique TN instances.

Btw, removing TaxonomicName would not necessarily mean we make it a literal string of a usage. We could as well merge all/most properties of TaxonomicName with TaxonomicNameUsage.

nielsklazenga commented 4 years ago

IPNI is a good example that shows that taxon names do not always come with usages.

cboelling commented 4 years ago

Just because every A has a B doesn't mean that every A is a B.

I am arguing the exact same position as you do, @jar398 and (very cursory) sketched one possibility of how instances of TaxonomicName could be modeled with an object-like, ontologically anchored identity distinct from the literal as which they are usually observed and how the relation between instances of TaxonomicNameUsage and of TaxonomicName could be modeled correspondingly.

Probably my phrasing was inadequate, I edited it, hopefully it is clearer now.

deepreef commented 4 years ago

Ok, lots to comment on. I didn't have time yesterday to post on the practical side of TaxonomicName as a Subclass of TaxonomicNameUsage (both pro and con), and many of the the things I'd plan to mention touch on posts presented here. I'll still post something to that effect, but this post will focus on replies to the above posts.

First: @nielsklazenga : PLEASE do not change the name of TaxonomicNameUsage! If you need to change something, change TaxonomicName. TaxonomicNameUsage is a terrible name, but it is by far the least terrible of other plausible options. TaxonomicConcept is much, much more terrible. So is Treatment. Both of those could legitimately represent subclasses of TNU, but are way too narrow for the Superclass of Object we need to mobilize taxonomic information.

@camwebb : “Can a TNU serve as a functional Proxy for a TaxonomicName?” already has an answer, and it is an unambiguous and resounding "YES". We have confirmed the utility of doing so for nearly twenty years now internally, but the failure has been that the only public Window on this has been ZooBank, which doesn't showcase how powerful it is (informatically) to encapsulate nomenclatural information within TNUs. However, ZooBank -- a purely nomenclatural system -- could not do what it does if it did not capture taxonomic name data within instances of TNUs. The frustrating thing for me is that it can do SO SO SO much more, but we simply haven't had the resources (time or funding) to showcase it. I gave a presentation on this at an iDigBio workshop years ago, available in PDF form here, and there used to be a video of this presentation online somewhere, but I wasn't able to find the link easily.

So this is not the question we need to answer. I think the more important questions are:

“Can we define a meaningful Class of object that fulfills our informatic need for TaxonomicName that is not a subclass of TNU?” [My answer: maybe, but haven't seen it yet.]

and

"Is it smart to frame TaxonomicName as a Subclass of TNU for the purposes of an exchange standard?" [My answer: maybe, but maybe not -- see my next post]

Just to round out the replies to the post form @camwebb: in my mind, option one is a non-starter (a class consisting of only one property -- as a literal string -- is not especially helpful). Option two: maybe. I'd need to understand this one better. What terrifies me about option three is this sentence:

We try hard to define exactly what a TaxonomicName is when shorn of it usages.

Given that TaxonomicName instances are born in usages, and are only meaningfully represented within usages, they could only exist as purely abstract notions outside of usages. I can certainly understand a philosophical basis for this, but I think it would have very limited practical value. It's not the names themselves we're interested in as much as it is the mapping of names to sets of biological organisms -- which very clearly is something that happens (almost?) exclusively within usages. Even when we care about Names for the sake of Names (independent of the implied sets of organisms), the real action is all about usages. ZooBank is a perfect case in point. We don't track "Names" per se, we track "NomenclaturalActs", and according to the Code, those "Acts" happen exclusively within publications (i.e., usage instances).

Now... having said all of that, I wholeheartedly agree with @camwebb on his final paragraph:

This just may be an insoluble problem! But as most have been saying, just because there may be no way to talk about (including providing a GUID for) a TaxonomicName without creating an associated TaxonomicNameUsage, this does not mean that TaxonomicName and TaxonomicNameUsage are the same.

And that is exactly what I wanted to address in the post that I didn't have time to write yesterday (and will write in a moment).

But just a couple more replies:

@mdoering : I FULLY agree with all of your points above!

@nielsklazenga :

IPNI is a good example that shows that taxon names do not always come with usages.

Can you provide specific examples? I'm not sure exactly what you mean by this.

@cboelling , @jar398 : I have to confess that my sense and understanding of ontological approaches to these kinds of issues are woefully inadequate, and I sincerely apologize if half (most?) of what I write ends up sounding like gibberish.

OK... the usual apologies for the way-too-long posts (also applied prospectively for my next post)...

camwebb commented 4 years ago

@mdoering Thanks for the link to the CoL Names discussion. Lots of important stuff there.

@deepreef Thanks for taking my comments seriously. Looking foward to your upcoming post on the practical side of TaxonomicName as a Subclass of TaxonomicNameUsage.

nielsklazenga commented 4 years ago

@deepreef:

IPNI is a good example that shows that taxon names do not always come with usages.

Can you provide specific examples? I'm not sure exactly what you mean by this.

IPNI only has names, no usages.

(I am supposed to write a straw man charter for a task group, otherwise I'd have more)

ghwhitbread commented 4 years ago

? speaking as a member of the technical team responsible for IPNI 1.0 (1999) I can say the IPNI uses (or did) what we now call the “taxonomicNameUsage” pattern. Pedicularis inconspicua Tsoong looks like a usage citation to me.

ghwhitbread commented 4 years ago

If it isn’t then we are much further away from our objective than I thought.

nielsklazenga commented 4 years ago

@ghwhitbread You are right, but it is only the inclusion of the family according to the original publication that makes it a TNU. There is no requirement to give the family when publishing a new name, so many names in IPNI will not (or should not) have them. In any case, the names in IPNI can very well stand on their own. Also, I would interpret the Acta Phytotaxonomica Sinica 3(3): 292, 323 (1955) as namePublishedIn rather than nameAccordingTo (to use the Darwin Core terms).

@deepreef

First: @nielsklazenga : PLEASE do not change the name of TaxonomicNameUsage! If you need to change something, change TaxonomicName.

A reminder that the name currently is TaxonConcept in the standard. TaxonomicNameUsage is a working name at best and, if we want to change TaxonConcept to TaxonomicNameUsage, we need to have good arguments for it, As I see it, it only leads to confusion. First (actually, not first) we had this whole discussion about whether we needed an extra class for the "real" TaxonConcept or not and now the problem shifts to the other end and TaxonName becomes a problem. So I'd say, let's keep it as it is.

To me, TaxonomicNameUsage has something circular to it (or at least the wrong way around. When I am doing a revision, or a checklist, or an identification (or even enter a record (of any kind) in a database) I am thinking about taxa, not names. You need to have the thing first, before you can put a name on it. So the name should indicate that it is something of a taxon (if not the taxon itself) rather than the label that we stick on it. I am very comfortable with the TCS TaxonConcept. I am also comfortable with the Usage bit of TaxonomicNameUsage, as Usage implies a Concept to me, but not so much with the Name and the 'omic' bits.

“Can we define a meaningful Class of object that fulfills our informatic need for TaxonomicName that is not a subclass of TNU?” [My answer: maybe, but haven't seen it yet.]

We do not have to define it, as TaxonName is already in TCS1. Also, it does not need to be meaningful (this is not to say that I do not think it is), we just need to have a use case for it. I do not understand how you can have a meaningful definition of TaxonomicNameUsage without a meaningful definition of the thing it is a Usage of.

By the way, I completely get your issue with a class for the name and actually proposed to merge the two classes into one (#34). This was summarily dismissed at the next meeting with the argument that, if the same name string is used more than once, we need to have a class for it.

And again, nobody says you have to use the TaxonName class. If you do not have identifiers for name strings in your system, you can just create UUIDs that are based on them, or perhaps even better, use the ones the Global Names web service (okay @ghwhitbread, maybe IPNI was not such a good example, what about this one?) provides.

Given that TaxonomicName instances are born in usages, and are only meaningfully represented within usages, ...

I reject this premise and what follows from it. The meaning of a TaxonName is as a label of the thing that the TaxonomicNameUsage is supposed to represent, a taxon. Names are born in a publication at the same time as their first usage, so with a usage, not in a usage.

deepreef commented 4 years ago

OK, after looking at the presentation that I gave at the 2014 iDigBio workshop on Biological Digitization in the Pacific (referenced in an earlier post on this issue), I realized that it actually has a lot of relevance not only to this discussion, but to the entire TNU/TNC discussion. The original presentation is available here, but it's clunky and I could only get it to work by downloading and installing Adobe Connect. If you want to check it out, my presentation starts at about 0:57:00 in the video, but the earlier talks in the same session (especially Greg Riccardi's which comes right before mine) are also relevant and interesting.

Anyway, I went ahead and captured the audio from the original presentation and overlaid it on a timed version of the original PowerPoint file, and generated a clean video of the presentation that you can watch on YouTube. It's 22 minutes long, but at the risk of being overly self-promotional, I would recommend taking the time to watch it. It's over 6 years old now, but almost everything I say is still relevant today (although the definition of Protonym has been refined since then to not be explicitly connected to Code-compliance), and very relevant to this particular set of conversations on TNC (including this particular issue). It's not that it's particularly good or ground-breaking, but it explains where I'm coming from on these issues MUCH more effectively than I've been doing with my epic-long posts here, in part because it includes a lot of visual aids to illustrate what I'm talking about. (Note: the version I uploaded was 4K, but as of when I posted this, YouTube had only processed the low-res version; HD version still processing.)

The main point that this presentation explained, and that I've been using as a fundamental premise for what we're making a TNC data exchange standard, is to cross-link widely disparate data scattered all over the place through the shared common denominator: scientific names. More on this in the next post.

I'm splitting this off as a separate post, and will next post on the practical aspects of TaxonomicName as Subclass of TaxonomicNameUsage (or not).

The reason this post (and the one that follows) are delayed is that I wanted to make the video and allow time for it to upload and process within YouTube.

deepreef commented 4 years ago

Ok, so I want to start with the practical aspects of TaxonomicNameas a subclass of TaxonomicNameUsage (or not) with a quote from an earlier post by @ghwhitbread :

I do think that the standard needs to be about the interchange of data objects in a way that satisfies the majority of use cases rather than the adoption of a particular logical model.

I think this perfectly captures the dilemma in that we need to develop this standard to be optimized for data exchange, and not as an idealized data model. I completely agree with this, which is why I'm still unclear on whether this subclass approach is the best one in this context (there is zero doubt in my mind that it is the best approach to developing a clean logical model, but that's not what we're trying to accomplish here).

So first, I want to focus on a key part of the quote above: "in a way that satisfies the majority of use cases". I'm interpreting "use cases" in this context as use cases for a data exchange standard; not use cases for taxonomic data in general. My rationale for this approach to "use cases" is the same as the preceeding paragraph: we're talking about a data exchange standard, not a taxonomic data model. Thus, I'm going with the premise that all of our specific use cases involve sharing data from one purpose-built implementation to another purpose-built implementation.

In this context, a good first approximation of "the majority of use cases" is understanding where "the majority of data" exist. My very unscientific (but probably reasonably accurate) assessment of the major sources of taxonomic data related to names and concepts (i.e., TNC) are the following (starting with the most data):

  1. Non-digitized publications and unpublished documents
  2. Digitized publications and unpublished documents (e.g., BHL, modern born-digital literature, etc.)
  3. Natural History Museum specimen data (digitized and undigitized)
  4. Other non-vouchered specimen data (e.g., GenBank, iBOL, eDNA, etc.)
  5. Big data aggregators (GBIF, iDigBio, ALA, etc.)
  6. Primary sources of big data aggregators (e.g., GSDs; listed separately because they usually contain a lot more data than what makes it into the aggregators)
  7. Nomenclators (some overlap with the previous)
  8. A bunch of other databases (which collectively would be higher on this list, but individually taper off down this list in a long tail)

I'm sure I missed some and/or got some of them out of correct sequence, but the main point is that I think the "majority" of taxonomic data (especially involving scientific names) exists in the first two (published and unpublished documents, digitized or non-digitized). In other words, explicitly within TNUs.

The next two on the list (voucherd and unvouchered specimens and their deriviatives) are also a huge source of relevant data (1.4+ billion records in GBIF alone). However, I would argue that the VAST majority of these are not derived from instances of what we would call either tnc:TaxonomicName or tnc:TaxonomicNameUsage instances. Rather, most of them are represented as direct properties of dwc:Occurrence instances. More sophisticated implementations would represent them within dwc:Taxon instances, but as we all know, that Class is somewhat ambiguous with respect to whether or not those instances represent something closer to tnc:TaxonomicName or tnc:TaxonomicNameUsage. For the sake of argument, let's suppose that the majority of the dwc:Taxon instances more closely approximate what we're talking about as tnc:TaxonomicName instances.

I would guess that items 5, 6 and 7 have records that are probably closer to tnc:TaxonomicName than to tnc:TaxonomicNameUsage. However, there are two caveats here:

  1. I bet many of them would define what they mean by an instance of TaxonomicName differently from each other. Indeed, I'd bet very few of them have converged on the same general meaning of a TaxonomicName instance when creating their databases. and
  2. I bet many/most of them actually have all the bits they need to represent these instances in something much closer to tnc:TaxonomicNameUsage, even if theyt don't realize it.

For example on the latter point, certainly most of the nomenclators have some representation of the publciation in which the name was first established (hence, Protonym TNUs). Many/most of the more taxonomic sources likely have some representation of the publication in which the current usage of the name follows (hence, "Concept" TNUs).

My overarching point here is that it's not clear to me whether the "the majority of use cases" (≈ the majority of existing data) would be better served with a distinct TaxonomicName class, or by framing TaxonomicName name as a Subclass of TaxonomicNameUsage.

If it's true that the bulk of existing data are either born within the context of TNUs, or could be expressed as TNUs, it's still not clear whether it makes sense in an exchange standard to frame TaxonomicName instances as though they were a special case (Subclass) of TNUs, or distinct entities unto their own. The obvious problem is the one I posed, and @camwebb addressed, which is that we haven't yet managed to converge on a clean definition of what a TaxonomicName really is (or should be). And this despite literally decades of trying. This is an almost identical situation to the one we just hashed through on "Taxon Concept". In both cases, we know it when we see it, but we can't quite pinpoint a clean definition. By "clean" I mean one that can easily be explained to non-taxonomist content providers who will need to shape their data in a way that conforms to this standard. For now, though, let's assume we are able to come up with such a clean definition that we're all satisfied with (I hesitate to use the word "happy", because that's likely a bridge too far).

Once we have this clean definition, the next question is: what do we gain from keeping it as a separate Class from TaxonomicNameUsage (as opposed to a Subclass)? As best as I understand it, a fundemental characteristic of the Class->Subclass association is that the latter inherits properties of the former. In that context, I went through the various properties in our draft terms definitions document to see which properties apply best to which Class. I've also added comments to the document itself.

Here are the terms representing properties of TaxonomicName:

taxonomicNameString: This is the lesser of two key fields in question regarding how to define the TaxonomicName Class. We already know there will be collisions (i.e., two or more legitimately different instances of TN can share precisely the same literal text string), so we already know this cannot be unique. Treating TN as a Subclass of TNU pushes this term up to the TNU class, so we don't have any issues other than lots of repeated values in cases with many TNUs sharing the same literal.

taxonomicNameStringWithAuthor: This is perhaps the greater of two key fields in question regarding how to define the TaxonomicName Class. If it's possible that two legitimately different instances of TN can share precisely the same literal text string, then there needs to be clear explanations for what situations allow for this, and how to indicate them. We already know of examples where the same author established two homonyms in the same year, so we know there are at least some collisions here. I know of at least one case (maybe 2) where the same author created two homonyms on the same page of the same publication, so it cannot be strictly unique. Treating TN as a Subclass of TNU pushes this term up to the TNU class, so we don't have any issues other than lots of repeated values in cases with many TNUs sharing the same literal.

uninomial | genus | infragenericEpithet | specificEpithet | infraspecificEpithet | cultivarNameGroup: These are all pretty straightforward, and as far as I can tell would not pose any real issues for either TN or TNU (if TN is a Subclass of TNU), except that if TN is a Subclass of TNU and these are pushed to the parent TNU, then there would again be a lot of repetition in cases with many TNUs sharing the same literal.

taxonomicNameAuthorship | combinationAuthorship | basionymAuthorship | combinationExAuthorship | basionymExAuthorship: These terms open up a complex question about whether there is such a thing as the author of a name, or if the author is really the author of the treatment of the Protologue (i.e., author as a property of a Reference Object, rather than a TN or TNU Object). It can certainly be reflected in either TN or TNU instances, with the caveat of (again) lots of repeated values if treated within the TNU instance.

namePublishedIn: Whenever we have a name(string) + Reference (and namePublishedIn = a Reference object), we have a TNU. In this case, the implied TNU would be a Protonym.

microReference: This term term/property fits very natually within a TNU as well (in that a "Reference" is implied, and again name propetries + Reference properties are a clear sign that a TNU is in play). So, I assume this implies the minor reference parts/page number for the oririginal publication (referenced by namePublishedIn). Where do we store the mircoReference information for the TNU itself (i.e., the non-Protonym that links to this name via the taxonomicName term/property of TaxonomicNameUsage)?

publicationYear: Again, this is a property of a Reference, and hence a Protonym/Protologue, and hence a TNU.

rank: This one is subtle, and could conceibnably be represented as a term/property of our hypothetical taxonomicName class, rather than the TNU class. But I guess this means that "Aus bus xus" and "Aus bus var. xus" are different TaxonomicName instances -- which I'm sure is a natural fit for botanical names; but less so for zoological names, as the ICZN Code essentially treats them as equivalents (depending on the year).

verbatimTaxonRank: with this included in the TaxonomicName class, I assume this means that a different TaxonomicName instance is generated for every single variation of a represented rank? Otherwise, how else would you capture this on a TNU-by-TNU basis? This term/property seems like it really fits much more naturally within the TNU class.

nomenclaturalCode | language | nomenclaturalStatus: These terms all genuinely are properties of the "name" and are properly assigned to the TaxonomicName class, rather than all TNUs that use this name. In other words, this property truely is 1:1 for each name/Protonym/Protologue. Indeed, these are all properties that GNUB relegates to what is effectively the Protonym Subclass of TNU.

basionym: This is a property of TaxonomicName for botanical names, but more naturally a property of TNUs for zoological names. Effectibely this is accomplished via the Protologue TNU instance for either Code.

replacedName: This is definitely a property of a nomenclatural object, but in most cases, it needs an accordingTo (i.e., who says this name is the replacement for that other name?) And this, this one also could easily be a property of a TNU.

basedOn | conservedAgainst | protectedAgainst: These terms re mostly botanical, but I think they could easily be represented by TNUs (because externally referenced names must have existed somewhere, i.e., within TNUs).

sanctionedBy: This one is similar to the previous, but much more explicitly referenced-based, (and, hence, TNU).

There are some missing properties from the draft that genuinely are properties of the "name", rather than specific to any particular usage of a name, and hence either legitimately belong in a different TaxonomicName Class, or would be properties of the TaxonomicName Subclass 9if we went that way. The most obvious of which is what the ICZN calls "Correct Original Spelling". This is an example of certain rules in the ICZN Code that "just are" (i.e., do not need to occur through a published nomenclatural act within a TNU. For example, stripping of diacritical marks from epithets, and automatic conversion of things like "4-maculatus" to "quadrimaculatus". Not TNU is needed to assert this, so these sorts of things would be legitimate properties of a separate TaxonomicName object (or limited to the TaxonomicName subclass of TNUs).

OK, now time to look at the converse. If TaxonomicName is framed as a Subclass of a TNU, then it needs to be able to legitimately inherit all of the properties of its associated TNU instance (i.e., the Protonym/Protologue TNU instance):

taxonomicName | accordingTo: These are both packaged within the Protonym/Protologue link, so they would be appropriate in this context for that particular Subclass of TNU.

taxonomicNameUsageLabel: Just as east to format this for a Prtonym/Protologue (e.g., Aus bus L. sec. L.) as it is for any other TNU (e.g., Aus bus L. sec. Smith).

verbatimNameString: Obviously a Protonym/Potologue has this property just like any other TNU.

taxonomicStatus | acceptedNameUsage | parentNameUsage: Again, the vast majority of Protonyms/Protologues treated their associated name as Accepted, but these properties all still apply to Protonyms/Protologues. Also, there are some names in zoology that are deemed unavailable because they are first introduced in synonymy, so this actually has direct nomenclatural relevance.The parentNameUsage also gives is a direct/structured indication of the original combination.

vernacularNameUsage | scientificNameUsage: I'm a little fuzzy on these, so not sure exactly how to address them.

My take-home from all of the above is that most of the terms currently assigned to the TaxonomicName Class could easily apply likewise to the TNU class, with the only cost being that some values might get repeated a lot. Those values that would be repeated with 100% frequency could easily be the properties assigned to the TaxonomicName Subclass, so I still don't think it's a problem. And furthermore, it seems that all of the properites of a TNU fit naturally on to TaxonomicName instances if we regard them as a Subclass (with Protonyms/Protologues as the TNU instances associated with them).

In other words, I don't think we break anything, or prevent anything from working propely if we go the "Class-Subclass" route, instead of the "two distince classes" route.

BUT!!! Just because it can be done, doesn't mean it should be done.

I'm tired of writing, and it you've gotten this far, then you are no-doubt tired of reading, so I'll stop here. I would like to see an explanation of what advantages we get by treating TaxonomicName as a full and distinct Class, rather than as a Subclass of TNUs (i.e., the subset of TNUs that represent Protonyms/Protologues).

deepreef commented 4 years ago

@nielsklazenga :

A reminder that the name currently is TaxonConcept in the standard. TaxonomicNameUsage is a working name at best and, if we want to change TaxonConcept to TaxonomicNameUsage, we need to have good arguments for it, As I see it, it only leads to confusion.

Yes, I know, but part of the reason I'm excited about this process is that TaxonConcept was a mistake in TCS, and has led to far more confusion than TaxonomicNameUsage. This feels like one of those genuine steps forward with what we're working on. Of course TaxonomicNameUsage is bad, but TaxonConcept (and every other term I've heard suggested or have considered) is far worse. We should not let the perfect be the enemy of the good (or, in this case, we should not let the good be the enemy of the less-bad).

First (actually, not first) we had this whole discussion about whether we needed an extra class for the "real" TaxonConcept or not and now the problem shifts to the other end and TaxonName becomes a problem. So I'd say, let's keep it as it is.

Perhaps. But I can also easily say that a big part of the reason that TCS was never widely implemented is that it failed to deliver what we actually need. So if we are going to keep it as is, then what are we trying to change?

This is just an extension of the conversation we had on this recently. There are thee sets of properties we care about in Taxonomy (at least in terms of the informatics bits within the scope of TNC):

All three of these are almost entirely captured within TNUs.

One approach is to treat TNUs as the "least common denominator" such that they are used to satisfy all three of these informatic needs. Another approach is to break them into two or more distinct and purpose-focused Classes of things. That's the approach we've taken before, and the arguments we're having now are the same that have been ongoing for years. If we're going to spend all this effort coming up with a revised TCS standard, perhaps it's time to try something different from what we've tried in the past (which, for the most part, as not afforded much progress).

We've failed (repeatedly, over decades) to come up with a functional and broad/widely adopted definition of a "TaxonConcept". We've failed (repeatedly, over decades) to come up with a functional and broad/widely adopted definition of a "TaxonomicName". The lack of widespread adoption of the existing TCS underscores this. Yet, through this series of conversations, we've come tantalizingly close to coming up with a functional definition of TaxonomicNameUsage, and we have decades of experience in Australia, and within GNA/GNUB, and Index Fungorum, (and IPNI, even if they and their users don't realize it) that strongly suggests how valuable such a Class can be. Please do not abandon what seems to me to be the most tangible progress we've made in all these recent discussions.

Yes, of course we as taxonomists are interested in Taxa, not names. But from an informatics perspective, there is really no such thing as a "taxon". We cannot digitize a taxon and share it via UTF-8 encoding. And, we cannot embed a taxon as ink on paper. But we do have a 250+ year legacy of representing taxa through text-string names, built on top of a system that, for all it's warts and shortcomings, remains one of the most universally-adopted and long-standing standards in ALL OF SCIENCE -- with no signs of stopping anytime soon (PhyloCode notwithstanding). Taxonomic names are the link between abstract taxa (which exist as organisms in nature and thought patterns in human brains) and a structured way of indexing, organizing, and synthesizing information about taxa via digital informatics tools.

Names are crude informatic proxies for taxa. TNUs are much less-crude informatic proxies for taxa (and classifcations, and most Code-governed nomenclature).

I do not understand how you can have a meaningful definition of TaxonomicNameUsage without a meaningful definition of the thing it is a Usage of.

You can because TNUs can map this stuff using other TNUs. So if you can define the TNU, there is no need to define any other classes (except "Reference"). We still have some definitions to hammer out among properties/terms for TNUs, but those are much more tractable than defining entire Classes of things.

And again, nobody says you have to use the TaxonName class.

How, then, will I share information on rank, parsed name components, original publications, etc., etc.? These are all properties of TN in the draft, but they are also properties of TNUs, so there's no way to capture them for a TNU without creating an instance of TN to hold this information, as referenced from the TNU (via the taxonomicName property of a TNU instance).

Names are born in a publication at the same time as their first usage, so with a usage, not in a usage.

That's certainly a fair way to look at it, and which is why I'm not yet convinced that TN should be considered as a Subclass of TNU. But my gut tells me that it's a dead-end approach for an informatics solution, because it relies too much on ephemeral things that exist only in human minds, and in this realm, it's rare that two different human minds interpret key aspects of this stuff in the same way.

mdoering commented 4 years ago

Lacking time to comment on the entire discussion now, but a short remark on

A reminder that the name currently is TaxonConcept in the standard. TaxonomicNameUsage is a working name at best and, if we want to change TaxonConcept to TaxonomicNameUsage, we need to have good arguments for it, As I see it, it only leads to confusion.

We also have a hybrid world of terms in Darwin Core, where there is a dwc:Taxon class with a taxonID but, accepted/parent/originalNameUsageID for important relations. I would appreciate to harmonize with DwC as much as possible to avoid inter TDWG confusion. I forgot why we did not go for TaxonNameUsage, but even NameUsage might be good enough?

nielsklazenga commented 4 years ago

@mdoering I fully agree, What the community expects us to deliver is a dwc:Taxon, or at least something that is equivalent and, if we do not borrow dwc:Taxon, we need to have a very good story why we need two (or even three) terms for the same thing.

deepreef commented 4 years ago

@mdoering , @nielsklazenga : I also agree (strongly!)

To be perfectly honest, dwc:Taxon comes very close to meeting our needs all by itself, and honestly I think we should think about using it as a starting-point for the new TCS, incorporating some of the original TCS ideas as needed to extend the functionality.

As to the choice of terms, "Taxon" as a class name goes back to the earliest iterations of DwC (before we even thought of them as "classes", I think -- i.e., they were more like "logical groupings of terms"). I forget what year it was (I can look it up), but all the "usage" terms were added when all of DwC was being overhauled and modernized (I want to say this effort was then called "DwC 2.0", but I'm not sure). I know this, because I am the one who proposed all those "usage" terms to John Wieczorek, who was organizing that version of DwC.

I don't believe at the time that we ever discussed changing the name of the class to "TaxonNameUsage", but I do recall suggesting we change the term taxonID to taxonNameUsageID. I can go dig up the email exchange, but as I recall, John said he thought it was was better to leave it deliberately ambiguous as to whether people would provide records representing taxon names, taxon concepts or TNUs. I felt uneasy about this, but at the time the top priority was to make it accessible and usable, which meant lowering the bar for generating content (i.e., relaxing definitions to be more accommodating).

In my mind, TCS was more about exploding out a more robust "taxon" data exchange standard, both to add more terms, and also to represent it more as a data model through an XML schema (rather than a "flat" list of standard terms with definitions). I think TCS failed to achieve critical mass for more or less the same reasons that LSIDs failed -- to a small degree because of technical problems, but to a larger degree that it was too difficult for most data providers to implement properly.

After this recent round of discussions, I'm wondering whether we can achieve almost everything we need through modifications/extensions of dwc:Taxon plus strong guidance on how to utilize dwc:ResourceRelationship to accommodate the other relationships not already embedded within dwc:Taxon.

ghwhitbread commented 4 years ago

@ dwc:taxon

I kind of agree. For various practical reasons [1], and for want of a TCS-lite, the NSL is being delivered using modified dwc:taxon,  ± based on GBIF checklist format. With some fudging, and a few extra terms, it can be managed. As a stop-gap effort, it has proved useful but it is a long way from where we need to be in terms of re-useable nomenclatural and taxonomic content. It was this need for a TCS-lite that provided the fillip to re-convene the TNC IG at the Dunedin meeting.

But not dwc:taxon, please!

Darwin Core is defined as a data-set standard. It does not work, for names and concepts, when we need to deliver the individual objects for DWC to reference.

TCS101 was a ratified standard (2005) before many of the borrowed (though sometimes renamed or redefined) terms appeared in Darwin Core where they are variously defined as references to external services or as references to records within the same DWC dataset.

dwc:taxon has a primary use case based on the identification in an occurrence. There is a recommendation in the Darwin Core RDF Guide, “part of” the Darwin Core standard, to keep it there and to shift the names and concept stuff back to a more RDF friendly TCS like standard.

I think it would be fair to say that one of the biggest issues for biodiversity informatics today is the legacy of Frankenstein concepts and taxonomies born in occurrence determination data and delivered via dwc:taxon. Rather than talking about bundling TCS into DWC, we should be recommending that the TCS terms in DWC be deprecated in favour of referencing external TCS 2.0 objects.

The terms for Names, Concepts, and their Relationships naturally belong in a separate standard. Where the terms, their definitions, and type vocabularies can evolve and be managed in accordance with the Codes and the requirements of nomenclatural and taxonomic systems, and their clients - including Darwin Core.

But the idea that we start with a DWC like TCS-lite is a good idea.
Ironically, the starting point for this idea is that everything becomes a TNU!

[1] The refusal of aggregators to accept or promote the TCS standard and the evolving TDWG RDF ontology.

deepreef commented 4 years ago

@ghwhitbread : I agree with everything you say here. I wasn't really serious about tweaking dwc:Taxon [+ dwc:Resourcerelationship] to meet our needs; I was more thinking that the terms are basically all there. We need to flesh out a number of things like you say, and make it more structural (and precisely defined). But it almost seems like a better starting point than TCS 1.

The elephant in the room in terms of missing support for data exchange is a literature exchange standard that works well int he context of taxonomy (i.e., in terms of granularity of objects and robustness of dating). I'm still way-negligent on that and probably will be for the next couple of months.

nielsklazenga commented 4 years ago

The elephant in the room in terms of missing support for data exchange is a literature exchange standard that works well int he context of taxonomy (i.e., in terms of granularity of objects and robustness of dating).

:+1: We'll talk a bit more about that in the coming meetings as well.

I'm still way-negligent on that and probably will be for the next couple of months.

:boot:

jliljeblad commented 4 years ago

[1] The refusal of aggregators to accept or promote the TCS standard and the evolving TDWG RDF ontology.

I'm curious about what we can learn from this? Was the refusal due to opposition to the actual standard or was it rather bad timing, the aggregators not being ready to put in the work needed or technology being too novel?

mdoering commented 4 years ago

I'm curious about what we can learn from this? Was the refusal due to opposition to the actual standard or was it rather bad timing, the aggregators not being ready to put in the work needed or technology being too novel?

From my experience I can say that simple CSV files were way simpler to generate and consume than the TCS XML and even more so the RDF. DwC as XML was also more of a barrier. But the DwC simple XML was at least entirely flat with even no xml attributes, so that again was very simple to generate and consume, e.g. in the DiGIR days. DwC as RDF I have not seen being used much at all. In my experience reading/writing RDF with the right libraries isn't that hard at all. But the strength of RDF that allows you to just mesh triples also comes with the downside that it often feels too flexible and people can easily mix schemas, namespace and ontologies. For exchange I clearly prefer well defined standards that can be strongly validated (obviously you can do that also with rdf). The TDWG ontologies also never reached a final state, so it wasn't very attractive to develop against a moving target. And with GBIF, EOL and others not embracing TCS or RDF it was even less attractive. A big blocker for TCS I still believe were the blank literature and to a lesser degree the specimen slots.

nfranz commented 4 years ago

Great question, @jliljeblad which I would personally answer as follows. First, we haven't learned all or most there is to learn. A mix of minimally these factors are involved. Likely the greatest, but not the most deeply explored, is that it takes considerable widespread political will to push the adoption of TCS syntax and semantics. Including political will manifested in the pertinent funding agencies. The science community has so far largely failed to generate that political will. Hence the most successful implementations of TCS like data services have tended to be by projects that are not especially dependent upon inter-/national agency-level funding and support. The Flora of the Mid-Atlantic, Alaska, Avibase, eBird, iNaturalist. Etc. Some of these projects are literally politically steered by 1-2 people who are already somehow with it, or cater so directly to user communities demanding this that in some sense they have not much choice but to develop along (or lose their client). Implicit in that answer: technical issues are secondary; scalability issues are secondary; issues of suitable data availability are secondary. The notion that the community has many other pressing issues also on its plate is secondary. To me those are all rather intellectually embarrassing arguments possibly designed to steer away from what I wrote above. Then also: fitness for use. There just is a lot of good science, and tolerable ambiguity, that many sections of the biodiversity data research community can tolerate, or individually and offline compensate for with extra effort (that does not flow back into the system often), while using DwC-aggregated biodiversity data. Switching to a TCS level only becomes important if these projects make an effort to look beyond their focal funded scope, for which there are not enough incentives. Again, that is a start from my viewpoint, no more.

jliljeblad commented 4 years ago

Thanks @mdoering and @nfranz for valuable input. This came with good timing since we're trying to document problems with the current TCS at the ongoing Hackathon with the closing meeting in 20 minutes from now. Now, back to preparations.

ghwhitbread commented 4 years ago

TCS issues

mdoering commented 4 years ago

@nfranz lacking political will is overstretching it. There simply needs to be advantages in using a new format. GBIF would happily digest TCS XML documents just as we do ABCD documents. The problem GBIF wanted to get over with DwC-A is not XML, it is the federated query systems we built with the DiGIR and BioCASe protocols that were just dead slow and not scaling. It is silly to think technical or scalability issues are irrelevant - even if I agree that scalable used to be one of these hype terms you find all over for no good reason.

Greg has a very good point in the linked document that TCS is document oriented and built for sharing datasets, but does not work at all with (RESTful) webservices when we want to work with individual records.

ghwhitbread commented 4 years ago

Back to “Should taxonomicName be represented as a Subclass of taxonomicNameUsage”. Being a semantic relationship I think I prefer to declare taxonomic name as a subtype of Taxonomic Name Usage (TNU). TNU are a convenience, an aggregation class for name-reference-context events, and a simple means for storing, citing, relating and sharing these events. One might further argue that TNU is a subtype of Reference ... TNU subtype r

deepreef commented 4 years ago

TNU are a convenience, an aggregation class for name-reference-context events

Thank you, @ghwhitbread ! I've never seen it put this way, but now that you've put it that way, it elegantly captures it in a way that I've never been able to articulate before.

I'll need to spend some time absorbing the diagram (e.g., what is an "assertion" in this context? Is it TaxonRelationshipAssertion?)

ghwhitbread commented 4 years ago

Of the diagram: It is still a sketch, and obviously incomplete; still arguing with myself about some of the relationships. The intention is to provide a model for an hierarchical type vocabulary that can be used with a flattened TNU class and term list to facilitate communication and interchange. Open arrows indicate sub-types, closed arrows object relationships, broken links are implied (but could be used).

Yes, assertion includes taxonRelationshipAssertion.

Still trying to decide if a node is a TNU relationship type or just a parent property on TNU.

ghwhitbread commented 4 years ago

I have changed the diagram to make Node a child of the Relationship subtype. It allows for the usage of Edge TNU’s (and their properties) and supports Kevin’s use case for “blank” nodes within a tree.