tdwg / dwc

Darwin Core standard for sharing of information about biological diversity.
https://dwc.tdwg.org
Creative Commons Attribution 4.0 International
201 stars 70 forks source link

New term - typifiedName #28

Open tucotuco opened 9 years ago

tucotuco commented 9 years ago

New Term

Submitter: Markus Döring Justification: Clear separation of the type status and the typified scientific name that is typified by a type specimen, the subject. Looking at how dwc:typeStatus has been used in all of GBIFs specimen data one can see there is the need to express this, but it should better be handled with a term on its own and leave typeStatus for the status of the type only. The term name itself is also used by ABCD: http://wiki.tdwg.org/twiki/bin/view/ABCD/AbcdConcept0603 Organized in Class (e.g., Occurrence, Event, Location, Taxon): Identification Definition: Scientific name of which Organism is a nomenclatural type. Comment: It is recommended to also indicate the typeStatus of the Organism. Refines: None Replaces: None ABCD 2.06: DataSets/DataSet/Units/Unit/SpecimenUnit/NomenclaturalTypeDesignations/NomenclaturalTypeDesignation/TypifiedName

Original comment:

Was https://code.google.com/p/darwincore/issues/detail?id=197

==New Term Recommendation== Submitter: Markus Döring

Justification: Clear separation of the type status and the typified scientific name that is typified by a type specimen, the subject. Looking at how dwc:typeStatus has been used in all of GBIFs specimen data one can see there is the need to express this, but it should better be handled with a term on its own and leave typeStatus for the status of the type only. The term name itself is also used by ABCD: http://wiki.tdwg.org/twiki/bin/view/ABCD/AbcdConcept0603

Definition: The scientific name that is based on the type specimen.

Comment: It is recommended to also indicate the typeStatus of the specimen.

Refines:

Has Domain:

Has Range:

Replaces:

ABCD 2.06: DataSets/DataSet/Units/Unit/SpecimenUnit/NomenclaturalTypeDesignations/NomenclaturalTypeDesignation/TypifiedName

A typical example how typeStatus is used currently is:

ISOTYPE of Polysiphonia amphibolis Womersley

which we could express much better with 2 terms:

dwc:typeStatus=ISOTYPE dwc:typifiedName=Polysiphonia amphibolis Womersley

peterdesmet commented 4 years ago

This proposal needs more evidence for demand (see the Vocabulary Maintenance Specification - Section 3.1). Anybody who is interested in the adoption/change of this term, should comment with their use case below. If demand is not demonstrated by the next annual review of open proposals (late 2020), this proposal will be dismissed.

peterdesmet commented 4 years ago

Ping @mdoering

qgroom commented 4 years ago

There is certainly a need for this and nomenclatural information like this are certainly under worked. Why not add typifiedName to the TypesAndSpecimen extension? currently it has scientificName included, which is not the same thing and easily confused with dwc:scientificName. https://tools.gbif.org/dwca-validator/extension.do?id=gbif:TypesAndSpecimen

matdillen commented 3 years ago

How does this addition work when there are multiple typified names for a single specimen? Currently this would be concatenated into dwc:typeStatus, e.g. https://www.gbif.org/occurrence/1839378016

nielsklazenga commented 3 years ago

I strongly support this and can provide a use case if you need it.

@matdillen , having more than one type specimen is, at least in botany, a very rare occurrence and then the specimen is a syntype or paratype (not really types) of one name and the holotype of a more recent name, so you choose the latter name. My experience though is that when you see more than one typified name for a specimen that is almost always an error.

@qgroom , many people see nomenclatural type designations as Identifications, so in that sense, scientificName in the Types and Specimens Extension seems appropriate. I cannot get a clear picture in my mind whether, if you include both types and selected specimens examined, this might become ambiguous or not, so you may be completely right.

RRabeler commented 3 years ago

Just was directed to this thread - I strongly support the need as well!! Being able to do this in ABCD (like this https://www.gbif.org/occurrence/1638363416) but not DwC makes it difficult to effectively cluster collections. If a specimen that is the type of name X but is only listed in GBIF by its current determination Y, clustering by looking for name X would miss that specimen.

mjy commented 3 years ago

We have semantics in TaxonWorks that would require this one-to many relationship between collection object and taxonomic name: https://rdoc.taxonworks.org/TypeMaterial.html.

debpaul commented 3 years ago

@peterdesmet I note we missed the 2020 review period, but clearly there's interest in moving this conversation forward. I"ve asked @RRabeler to get other colleagues to weigh in too.

deepreef commented 3 years ago

I guess I'm a little confused. Throughout most of DwC information is captured in one place, not multiple places. We're talking about a relationship between a specimen and a scientificName, which already exists within the Identification class. At the moment, the typeStatus term is (correctly) grouped with that class. Presumably, each instance of Identification joins one instance of a specimen (technically MaterialSample, but I suppose many people would directly link it to an instance of Occurrence, as instances of that class are often used as proxies for the associated MaterialSample). Thus, the expression "specimen X typifies name Q" is easily (and appropriately) captured within an instance of Identification.

We have a habit in biodiversity informatics of defining good terms and clustering them into good classes, then not using them broadly (I'm looking at you, MaterialSample, ResourceRelationship, MeasurementOrFact). Obviously, some people use these classes and associated terms (especially the last), but they are, in my opinion, grossly underutilized. I think the Identification class and its terms is another example that should be built in to our information exchange systems -- but that's a rant for another issue/thread.

So, the issue with a specimen and a name as a type is not an inherent property of either the specimen or the name -- it's a property of an assertion joining the specimen and the name, which is why it's correctly clustered within the Identification class. But even if you ignore the data model normalization thing, and flatten a record into a DwCA dataset, isn't the information for typifiedName already represented via all the terms from the Taxon class also included with that row?

Suppose I have a type specimen in my collection, and I expose it via a DwCA datset. One property of that record is materialSampleID, or even occurrenceID, or at least the DwC triplet of ICode+CCode+CatNumber -- so we know what specimen we're talking about. Another property is typeStatus, so we can capture ISOTYPE. And another set of properties are all the Taxon class terms -- including sientificName. So don't we already have typifiedName represented in the form of scientificName for the same record (representing a PreservedSpecimen) that includes a value for typeStatus?

The problem, I assume, is that datasets will represent a MaterialSample record (camouflaged as an Occurrence record) where the associated scientificName is represented as the "current accepted" taxon, rather than the name for which the specimen plays its role as typeStatus (and then include the typified name within the typeStatus field). Thus, the addition of typifiedName would allow the value of scientificName (and other terms of the Taxon class) to reflect the current taxonomic identification of the specimen, while also indicating that the specimen also serves as a type of a different taxonomic name. I get that -- but isn't the better solution to educate content providers that they should be using acceptedNameUsage for this purpose? Or better yet -- start actually using the Identification class for what it was intended (i.e., allowing a single specimen to be represented with multiple taxon identifications, with typeStatus applied only for the one scientificName for which it actually typifies.

I do understand that we live in the real world, where people provide content in highly flattened/denormalized form, and this leads to crude efforts to overcome the inherent limitations of doing so (such as adding information about the typified name within the typeStatus property). In this context, I guess it makes sense to add this new term -- but from my perspective, the term would exist only to further enable us to avoid leveraging the capabilities already built into the DwC standard, and perhaps represents a step backwards from fully realizing those existing capabilities within our information exchange systems.

mjy commented 3 years ago

I agree with @deepreef that the "bits" of data can all be shunted into a format that is sharable as is. I suspect though, that 'typifiedName' represents a need for teasing out semantics?

Perhaps our model in TaxonWorks will further confuse things help draw this to a conclusion. Capitalized words are Classes.

TaxonDetermination

TypeMaterial

Instances of these data can be fed to DwC as is, I'd have to look at specifics to dig into where we put the bits.

mdoering commented 3 years ago

@deepreef I understand those worries, but DwC (and other standards like ABCD or EML too) has lots of terms only existing to allow flat views. acceptedNameUsage really is the scientificName of the Taxon record linked via acceptedNameUsageID. kingdom, phylum and the other flat ranks are similar convenience terms. The same is true for other location and temporal terms that should sit on an Event or even Location instance. By far the vast majority of DwC use is flat. It's what was (once?) called Simple Darwin Core. No doubt we should be moving to a more relational world, but I think the proposed term makes a lot of sense for the current use of DwC.

nielsklazenga commented 3 years ago

I think the issue is that, currently, the usage of typeStatus is different when used in the Occurrence Core than when used in the Identification History or the Types and Specimen Extensions. While in the extensions typeStatus is just the kind of type, in the Occurrence Core it also includes the typified name and other information, so an entire Identification if you see it that way.

Despite being in the Identification class, the way typeStatus is defined in Darwin Core makes it one of those terms that only exist to allow flat views (free after @mdoering just above) and only suitable for use in the Occurrence Core. So, people who want to deliver typification in the Identification History extension should support this proposal to split off the typified name from the type status (or kind of type).

I do not really want to go into why nomenclatural type designations are not Identifications, as that is not what matters here. What matters is that, if we treat them as Identifications, they are not (necessarily) the same Identification as the current Identification, which we deliver in the Occurrence Core. So we basically want to have two Identifications in the Occurrence Core, the current Identification and the nomenclatural type designation Identification. In order to allow consistent use of (a redefined) typeStatus, we also need to have a way to distinguish between the scientificName from the current Identification and that from the nomenclatural type designation Identification in the Occurrence Core record, which is where the proposed typifiedName comes in.

deepreef commented 3 years ago

@mdoering : OK, fair points! I just wanted to make sure I understood. So am I correct that the problem is that people represent type specimens with names that are different from what the specimens typify, and that for whatever reason they're not using the acceptedNameUsage term to capture the current name, and scientificName for the original typified name? (there is no requirement that acceptedNameUsageID must be populated in order to provide a value for acceptedNameUsage). Also, most of "flattened" terms aren't redundant to other terms/structures already in DwC, and/or weren't established with the explicit intention of accommodating flattened representations of the data. But if you feel there is a need for this term as an additional flat-friendly way of capturing information that people are presenting in typeStatus or some other incorrect way, then I wouldn't push back against it. I just wanted to make sure I understood the need. Also... what class would the term belong to? Would it be best to include within Identification class, or the MaterialSample classe? (Please, Please not Occurrence!!)

@mjy :

Model contains no rules other than exact duplication is prevented (no point in asserting the same thing twice)

If you mean "exact" as in same determiner, same date, same taxon; then I agree. But we allow multiple determiners to assert the same taxon on the same specimen; and also the same determiner to assert the same taxon on the same specimen on different dates (why throw away information). But I agree that same determiner, same specimen, same taxon, same date is redundant.

TypeMaterial Links a Specimen to a TaxonName (nomenclatural concept) Objective

That's how I used to model it, and that's how a lot of nomenclators model it; but there is enough grey area in this space that I finally had to acknowledge that "specimen is type of taxon" is not as "objective" we all wish it were, and really requires an "accordingTo" reference, just like any other assertion.

@nielsklazenga : < in the Occurrence Core it also includes the typified name and other information, so an entire Identification if you see it that way.

OK, I didn't realize this was a "thing". We represent (or at least intend to represent) our type specimens using scientificName for the typified name. If we want to represent the "current" interpretation of the name, we use acceptedNameUsage. But I agree if people are mashing additional information (like the typified name) into typeStatus, and they are unable (or unwilling) to represent it using more appropriate terms, then maybe typifiedName could be useful. But if this term is added, will people actually use it?

Despite being in the Identification class, the way typeStatus is defined in Darwin Core makes it one of those terms that only exist to allow flat views (free after @mdoering just above) and only suitable for use in the Occurrence Core.

That certainly is not what that term was originally intended for. I was unaware that the "Examples" in the DwC reference were updated to what they are now. It used to be for terms like "Holotype", "Paratype", "Lectotype", etc. But now that I see the Examples as given in the quick reference guide, I understand why it's a problem. I must have missed the discussion that updated those Examples, because I would have strongly objected to that. But if that's what people are doing, and that's what the community really thinks is now appropriate for this term, then I agree that adding something like typifiedName is the lesser of evils. It feels like a step backwards, but I guess we can't always move forward.

I do not really want to go into why nomenclatural type designations are not Identifications

Technically not Identifications, but that's by far the closest Class in DwC to which type designations belong. They're not properties of specimens (MaterialSample) or of Taxon, because they are asserted statements, not inherent facts.

Anyway, now that I see how the "Examples" for dwc:typeStatus have been updated to say, I understand why this problem exists. And if adding the term typifiedName can solve problems in the near term, I would support it.

mdoering commented 3 years ago

We represent (or at least intend to represent) our type specimens using scientificName for the typified name. If we want to represent the "current" interpretation of the name, we use acceptedNameUsage.

I wonder how common that is. I always assumed scientificName should be the current determination. @timrobertson100 I believe GBIF expects that too. It probably does not make much difference as long as the accepted name and typified name are both under the same taxon in GBIF.

mjy commented 3 years ago

that "specimen is type of taxon" is not as "objective" we all wish it were, and really requires an "accordingTo" reference, just like any other assertion.

But this is precisely what we are not doing. Specimen is Type of TaxonName, not Taxon (== OTU) concept. Where does this fail or become a gray area? If it does fail then the code of nomenclature doesn't work AFAIK.

mjy commented 3 years ago

do not really want to go into why nomenclatural type designations are not Identifications

I do! They are not, and we need special treatment of these facts lest the be confused for something they are not. Object != subjective.

deepreef commented 3 years ago

@mdoering :

I wonder how common that is.

I had always assumed that every Museum worked this way; but maybe not? Also, despite what I write below, the closest thing there is in our world to an "objective" determination is the one that links a scientificName to a name-bearing type. From that perspective, it seems to me that scientificName most correctly should be represented as the "typifiedName", whenever a name-bearing type is in play (not so much for Paratypes).

On the other hand...

@mjy

If it does fail then the code of nomenclature doesn't work AFAIK.

Yup... sad but true. Many (most?) names don't even have types (at least not in zoology). The ICZN Code has rules for retroactively designating types, but this is generally only done when there is a specific need to do so. Indeed the Code expressly prohibits designating neotypes unless there is a specific taxonomic ambiguity that needs to be resolved. The typical example is that a series of specimens known to be available to and examined by the author of the name are regarded as a syntype series. In some cases, one of those syntypes is elevated to a lectotype. And in very specific cases, neotypes are designated. Even in modern original description, the author "fixes" the type through a nomenclatural action (this was only explicitly required by the ICZN Code after 2000). So we have all kinds of situations where at one point in time a taxonomist retroactively recognizes a syntype series for an old name. Then later someone elevates one of those specimens to the status of lectotype. Then maybe someone else who is unaware of the lectotype designation picks another one of the syntype series and declares it to be the lectotype. Or, sometimes someone designates a neotype, then an original specimen (holotype or syntype series) is discovered.

Thankfully, these situations aren't common -- but they're not so rare that we can just sweep them into the dustbin of "edge case" either. As much as the Code likes to make this stuff as objective as possible, it turns out that a non-trivial number of cases involve some level of subjective interpretation (e.g., "Did the author have access to this specimen prior to establishing the name, in which case it can be considered part of the syntype series?")

That's why I had to (reluctantly) abandon my hopes and dreams to treat the relationship between a name and its type as an objective fact, as opposed to an assertion with an accordingTo.

The meaning of dwc:typeStatus, to me at least, is not so much a statement "this specimen is the type specimen of that name"; but rather something more like "the label for this specimen includes the word 'Holotype' on it, in association with this name". That's probably the most compelling evidence we have that a particular specimen is, in fact, a type specimen for a name. But I've encountered more than a few cases where the label was wrong. And not just for very old names, either.

And, as I said, Identification is not exactly the same thing as type fixation, but it's the closest thing DwC has to it.

mjy commented 3 years ago

@deepreef

I think you've identified many cases where it's hard (or impossible) to make an assertion of a specific type. This in my mind is not the same thing as saying we shouldn't make certain assertions when they are possible, or treating our assertions specifically to mean one thing. For the record we can stack as many Citations on either class of facts I referenced (and any class of fact) in TaxonWorks, so we can precisely reifiy our data based on your view, but, more importantly, we can enforce the rules if we need to, the same can not be said if strong assertions are not made.

Frankly it feels like you've made a strong argument for abandoning nomenclature all together, which in my mind is not necessarily a bad thing ;).

nielsklazenga commented 3 years ago

@deepreef, we seem to be mostly in agreement.

We represent (or at least intend to represent) our type specimens using scientificName for the typified name. If we want to represent the "current" interpretation of the name, we use acceptedNameUsage.

In the past, I have (sort of) proposed the opposite, using originalNameUsage for the typified name, which is marginally less inappropriate, but of which I am still not in favour now (and was not really then). acceptedNameUsage and originalNameUsage are taxa, which do not have types. The "current" determination of a specimen and the "current" interpretation of a name are different things. The problem here is that scientificName is such a crappy name for a property, so its use in the Occurrence Core can be ambiguous (do not read this as a suggestion that this should be changed).

@mjy

I do! They are not, and we need special treatment of these facts lest they be confused for something they are not. Object != subjective.

I agree. I think part of the problem is that typification is often confounded with annotations on specimens that the specimen is some kind of type for a scientific name (which putting it in the Identification class encourages). As you have already pointed out above, typification is to names, not taxa (like identifications are); also they are done in publications, not as annotations on specimens. Nomenclatural type designations are facts. That does not mean everybody gets their facts straight or even agrees what the facts are. However, the assertion will be in the annotation and will be in whether the specimen that is being annotated is the same as the specimen cited in the publication, or in the application of the rules of the relevant code. This is entirely different from the assertion that a specimen belongs to a taxonomic group, which is what an identification is. Putting typification-related terms in the Identification class is confounding the vehicle (annotation) with what it transports (identification or typification).

I do not think we really need a Nomenclatural Type Designation class in Darwin Core. A DwCA extension would be nice though, as I would have real problems with delivering typeStatus in the Identification History extension. I am perfectly happy to keep delivering them in the Occurrence Core, although there are relatively rare occasions where a specimen may be a syntype of one name and a holotype or isotype of a more recent name (I just deliver the latter in the Occurrence Core).

I also do not think we should abandon nomenclature altogether, although it might be best if some people would attach some less importance to it. Rather, people should stop confuddling taxa and their labels and realise that nomenclature only applies to (a certain type of) the latter.

All of this has little bearing on this proposal, as regardless of how you want to treat nomenclatural type designations, we still need to separate the typified name from the type of type.

qgroom commented 3 years ago

This is a bit of an aside, but all this discussion makes me pity the person just trying to publish their specimen data. So much of what has been written here is undocumented.

Take for example these terms...

acceptedNameUsage: The full name, with authorship and date information if known, of the currently valid (zoological) or accepted (botanical) taxon.

scientificName: The full scientific name, with authorship and date information if known. When forming part of an Identification, this should be the name in lowest level taxonomic rank that can be determined. This term should not contain identification qualifications, which should instead be supplied in the IdentificationQualifier term.

Is there actually any substantive difference in the definition of these terms? All I can see is that you can put invalid/unaccepted names into scientificName and you can put identification qualifiers into acceptedNameUsage. Clearly, that was not the intension, but it is undocumented. Accepted names are only useful when you know who accepted them and in this case there is no link to a publication, so the term is moot, it just means accepted within the context of this dataset.

originalNameUsage: The taxon name, with authorship and date information if known, as it originally appeared when first established under the rules of the associated nomenclaturalCode. The basionym (botany) or basonym (bacteriology) of the scientificName or the senior/earlier homonym for replaced names.

Apparently originalNameUsage actually has nothing to do with occurrence data at all. How often, and why, would anyone go to the trouble of finding out the basionym of the scientificName for an occurrence, when it might not even be the accepted name?

Whereas...

I was struck recently by the clean and clear documentation of Schema.org, with rich descriptions and examples of real life data. If there really is an alternative to typifiedName then it would need to be properly documented within the standard along with examples. Using extensions is always a burden in maintenance and for users. Therefore, it has to be easy to implement or only the minority of people will use it and it becomes redundant.

Typification data are some of the worst kept in our domain and I am really keen to see an improvement.

deepreef commented 3 years ago

OK, we seem to be discussing two things:

  1. How to model this stuff in an ideal way (i.e., whether or not typeStatus/typification are facts about names/specimens, or assertions about the relationship between names and specimens)
  2. How to solve a practical issue related to parsing overloaded content in typeStatus, resulting from the (heinous) "Examples" given for the DwC term typeStatus.

My sense is that @mdoering raised this issue in the context of # 2; but several of us seem more interested in discussing # 1. For the record, as already stated, I support the proposal by @mdoering for # 2. But I see it as a "band aid" solution to a problem of content misrepresentation resulting from peculiar "Examples" for the DwC term typeStatus. But we will eventually need reconciliation of # 1, -- either via the TNC-TCS group (if typificaiton is in scope), or somewhere else in DwC/TDWG-land.

@mjy

Frankly it feels like you've made a strong argument for abandoning nomenclature all together, which in my mind is not necessarily a bad thing ;).

Yeah, sometimes I feel that way too. But I think there is value in tracking nomenclatural acts as governed by major Codes, and as manifest through TNUs. I also think there is value in tracking treatments of taxonomic concepts/circumscriptions as treatments, also manifest through TNUs. And, I think there is value in tracking organisms, as manifest through both MaterialSample instances (i.e., specimens) and in-situ observations. The relationships between names and concepts/circumscriptions can be effectively captured through TNUs directly (TNC-TCS group working on this now). The relationships between names/concepts/circumscriptions and corresponding Organism/MaterialSample instances can be capture via Identification instances.

As I've already said, I think it's a mistake to frame the status of a specimen as a nomenclatural "type" as a direct property of either the name, or of the specimen -- the typeStatus represents a relationship between an instance of a name and an instance of a specimen. Thus, without creating a new class specifically to track instances of "typification", it's a very natural fit to track typifications/typeStatus via instances of Identification -- because instances of Identification and instances of typification both represent the relationship between names and specimens. This is why I say that the Identification class is not the perfect way to represent typeStatus/typification information

I'd like to explore this more, but I fear we'd be drifting too far from the issue at hand. Perhaps this is worth spawning a new issue?

@nielsklazenga : Yes, I think we're pretty close to agreement on the existing DwC terms in the Taxon class. Those came about in a context where the "basis of record" for a Taxon instance was intentionally vague and open, because the community had not yet settled on how to sort out taxon names and concepts. Of course, we still have not sorted that out, but it seems like we're making progress in the TNC-TCS space. My hope is that the product of that effort will wholesale supersede the current Taxon terms in DwC.

As you have already pointed out above, typification is to names, not taxa

Agreed! This is why placing typeStatus within the Identification class is not perfect (but better than the existing alternatives).

also they are done in publications, not as annotations on specimens.

Well... sort of. Under the nomenclatural Codes, typifications are events/acts that occur within publications (which is why they are best framed as assertions). However, in practice -- and even to some extent in the sense of the Codes -- the specimen label annotation over-rides what appears in publications. I can provide specific examples of this.

I would have real problems with delivering typeStatus in the Identification History extension.

Can you explain why you would have problems with this? Included among the Identification History are the instances where the publication that fixed the type also provided an Identification of the type specimen. Those are the Identification instances (i.e., the ones where type fixation occurs) where typeStatus should be populated. Obviously, that's not how the vast majority of DwCA content is created, which is why I support the need for a band-aid typifiedName term. but if you're talking about optimizing the data model for how typification actually happens, it's a pretty damn good fit (much better than, say "occurrence-as-specimen", which is also pretty rampant among DwCA content).

@qgroom : I definitely agree with the need for better documentation. I can comment a little on why there was (and, I think, still is) a need for three terms:

scientificName - intended to capture the name as labelled for a specimen or occurrence. This is implied to be whatever the latest "Identification" instance represents the specimen to be labelled as.

acceptedNameUsage - intended in cases where a content provider is aware that a given specimen is labelled with a name that is not consistent with the taxonomic perspective of the content provider. The reality is that there are many discrepancies within collections between what the label says for the name (usually the most recent identification from an expert who examined the specimen, which might have been decades ago), and the name that the content managers believe is the correct scientific context to apply in the modern context. This was to pave the way for collection managers to stop the highly undesirable practice of updating the taxon representation of a specimen based on a taxonomic change that did not involve anyone actually examining the specimen. We want to be able to represent the specimen both "as identified", and "as we would interpret the correct taxon name today".

originalNameUsage -- essentially intended to capture the basionym.

The intention of the "Usage" suffix on these and other terms in the Taxon class was to shift from the highly problematic issues associated with equating names & concepts (as alluded to by @nielsklazenga), and instead paving the way to a TNU-based way of modelling taxonomic information. TCS1 was intended to provide a mechanism to enable that; but it never "took". Perhaps we can do better with TCS2.

Typification data are some of the worst kept in our domain and I am really keen to see an improvement.

I think this is something we can ALL agree with!!!

mdoering commented 3 years ago

This discussion highlights that typifiedName should sit together with typeStatus on the dwc:Identification class. Something not defined in the original proposal.

nielsklazenga commented 3 years ago

Just circling back to the definition. Could we make it something like:

Scientific name of which the specimen is a nomenclatural type

?

I do not think Darwin Core needs to explain what nomenclatural types are and 'based on' definitely does not cover it.

qgroom commented 3 years ago

@deepreef Couldn't we put your explanations of scientificName, acceptedNameUsage and originalNameUsage into the comments of the Darwin Core terms? We need to capture this for the average user. @tucotuco should we create a separate issue?

@mdoering and all, the dwc:Identification class doesn't contain a name that the identification refers to. Indeed, the identification is to a Taxon, rather than a name. The documentation sorts of hints that it refers to scientificName. However, if you add typifiedName to the dwc:Identification class then some "identifications" are going to refer to names outside the dwc:Identification class and some to names within the dwc:Identification class.

For me typeStatus and typifiedName are properties of the specimen not the Taxon

deepreef commented 3 years ago

Well... those are the definitions that are in my head now -- not necessarily the definitions that were in my head when I provided those terms to @tucotuco back when they were first added. I'm happy to capture this in whatever form is appropriate to include in online resources, provided others on this thread agree that they make sense. I wouldn't want to mess things up in the same way that the comments for typeStatus effectively changed the original purpose of the field. But if hardly anyone uses these terms anyway, maybe its not so important. I'd also like to hear from @nielsklazenga and @mjy and @mdoering and others who spend a lot of time dealing with taxonomic data to see if these rough definitions seem OK.

the dwc:Identification class doesn't contain a name that the identification refers to. Indeed, the identification is to a Taxon, rather than a name

My understanding is that originally, instances of in the Taxon class were intentionally defined broadly. They could be interpreted as concept-like things, or they could be interpreted as name-like things. Because almost no two taxonomic databases share perfect parity on their respective instances, it made more sense for the definition of a Taxon instance to be vaguely correct, instead of precisely wrong. My hope is that the entire DwC Taxon class and associated terms will be wholesale replaced by whatever comes out of the TCS2 exercise, so I wouldn't spend too much time invested in tweaking those existing terms right now.

My understanding is that occurrenceID (originally) / materialSampleID (now) and taxonID were not repeated within the Identification class because they are effectively "foreign keys" to those other respective classes. Similar to how locationID is not included within the Event class. They're implied to be in there whenever content is provided in some sort of DwC format.

In contrast to @qgroom, I still support the inclusion of both taxonStatus and typifiedName within the Identification class -- not so much because they belong there, but because they fit belong in other existing classes even less. But I would certainly agree that having them as terms within the materialSample class makes MUCH more sense than within the Taxon class. Nevertheless, I still see them within the Identification class as being the least of evils.

nielsklazenga commented 3 years ago

I am happy to put up with keeping typeStatus in the Identification class, although I think it would be more appropriately placed in PreservedSpecimen and/or FossilSpecimen, which I think are equivalent to Occurrence (so in Occurrence). I have always thought that MaterialSample is for things that have been derived from specimens, like molecular isolates, rather than the specimen itself.

tucotuco commented 3 years ago

@deepreef Couldn't we put your explanations of scientificName, acceptedNameUsage and originalNameUsage into the comments of the Darwin Core terms? We need to capture this for the average user. @tucotuco should we create a separate issue?

Yes, please. One issue for each term change recommendation. Choose the Term Change template when creating the issues.

mdoering commented 3 years ago

the dwc:Identification class doesn't contain a name that the identification refers to. Indeed, the identification is to a Taxon, rather than a name. The documentation sorts of hints that it refers to scientificName. However, if you add typifiedName to the dwc:Identification class then some "identifications" are going to refer to names outside the dwc:Identification class and some to names within the dwc:Identification class.

For me typeStatus and typifiedName are properties of the specimen not the Taxon

@qgroom dwc:typeStatus currently is defined as an Identification term. Would it not be awkward to place typifiedName somewhere else? And I agree with @deepreef that Identification is the closest we got to a NomenclaturalEvent type.

Note that the identification extension (has to) flattens ID and Taxon and therefore contains scientificName: https://rs.gbif.org/extension/dwc/identification.xml

qgroom commented 3 years ago

@mdoering @deepreef I'm not saying typifiedName should not go in dwc:Identification, but once you add a taxonomic name to this class people are not going to know where the identified name is, except in the case of type specimens. So it will either need to be much better documented, or an identifiedName will have to be added to dwc:Identification.

deepreef commented 3 years ago

@qgroom : Ah! Sorry for my misunderstanding, and I see your point. People will no doubt confuse the value shown in typifiedName as being the name of the Identified organism. In this sense, I see the (larger) problem: typifiedName is in some ways less a property of Identification than typeStatus is, and perhaps is better suited for being a property of MaterialSample. But then it becomes decoupled from the corresponding typeStatus. Three options come to mind: 1) Create a new Typification Class, with properties something like:

2) Punt on the problem until TCS2 can replace all of the Taxon-Class terms in DWC and assume that it will include a Typification class something like the above.

3) Create the temporary band-aid solution proposed in this issue, and simply add typifiedName to the Identification class along side typeStatus, and hope people don't confuse its purpose.

4) Create the temporary band-aid solution proposed in this issue, and simply add typifiedName to the MaterialSample class and also move typeStatus to this class, and hope people don't get confused.

5) Remove the problematic "Examples" from the documentation for typeStatus, and recommend instead a controlled vocabulary for values like "Holotype", "Paratype", "Lectotype", "Isotype", etc., and don't bother with this new term.

There are other variants of these as well, but none of them is great. Personally, I think # 5 makes the most sense as the temporary solution until # 2 comes to fruition. Otherwise, I think # 3 is probably the least of evils.

Why can't modelling taxonomic information be easy?

tucotuco commented 3 years ago

Option #5 is by far the easiest if the definition does not change. Examples are not normative and require only Darwin Core Maintenance Group vetting rather than a public review.

On Sun, Mar 28, 2021 at 6:16 PM Richard L. Pyle @.***> wrote:

@qgroom https://github.com/qgroom : Ah! Sorry for my misunderstanding, and I see your point. People will no doubt confuse the value shown in typifiedName as being the name of the Identified organism. In this sense, I see the (larger) problem: typifiedName is in some ways less a property of Identification than typeStatus is, and perhaps is better suited for being a property of MaterialSample. But then it becomes decoupled from the corresponding typeStatus. Three options come to mind:

  1. Create a new Typification Class, with properties something like:

    • typificationID [unique identifier for the Typification instance]
    • typifiedNameID [points to an instance of Taxon]
    • typifiedName [textual representation of the typified name, presumably the same value as either scientificName or originalNameUsage of the corresponding Taxon instance]
    • typeSpecimenID [points to an instance of MaterialSample]
    • typeSpecimen [textual representation of the type specimen -- not sure what this would be, as the DWC triplet is connected to the Occurrence Class]
    • typeNameID [points to an instance of the Taxon, representing the type species of a genus, or type genus of a family]
    • typeName [textual representation of the type name, presumably the same value as either scientificName or originalNameUsage of the corresponding Taxon instance]
    • typificationRemarks [yadda yadda yadda]
  2. Punt on the problem until TCS2 can replace all of the Taxon-Class terms in DWC and assume that it will include a Typification class something like the above.

  3. Create the temporary band-aid solution proposed in this issue, and simply add typifiedName to the Identification class along side typeStatus, and hope people don't confuse its purpose.

  4. Create the temporary band-aid solution proposed in this issue, and simply add typifiedName to the MaterialSample class and also move typeStatus to this class, and hope people don't get confused.

  5. Remove the problematic "Examples" from the documentation for typeStatus, and recommend instead a controlled vocabulary for values like "Holotype", "Paratype", "Lectotype", "Isotype", etc., and don't bother with this new term.

There are other variants of these as well, but none of them is great. Personally, I think # 5 makes the most sense as the temporary solution until # 2 comes to fruition. Otherwise, I think # 3 is probably the least of evils.

Why can't modelling taxonomic information be easy?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tdwg/dwc/issues/28#issuecomment-808960740, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADQ7244S6TTBBUXSIZKGQDTF6MCVANCNFSM4AXKVP7Q .

nielsklazenga commented 3 years ago

I think terms in Darwin Core – TDWG standards in general – should be defined independently of how they can be modelled. typifiedName means something and can be easily defined. If people want to model nomenclatural type designations as Identifications and use scientificName instead, that is their choice, but this is not the only model Darwin Core should support as a lot of people who use Darwin Core do not. I think it is pretty obvious to everyone that a typifiedName is a scientific name and that can be made clear in the definition, as it is in the definition I suggested above.

Fact is we need to be able to use typeStatus and typifiedName in the Occurrence Core (or in a completely flat record). We have got all these images of type specimens in the Australian Virtual Herbarium (AVH) – which is a data hub in the Atlas of Living Australia (ALA) – but people cannot search for them by the name for which they are a type, as we cannot deliver the data, as there is no Darwin Core term for it. The Atlas currently does not index Identification History, so if we would deliver the information in Identification History, people would not even be able to search for just types. Also, out of 25 or so AVH providers, only two deliver Identification History, while probably all of them deliver type status information. We come from an ABCD (and BioCASe Provider) background, which has TypifiedName and ALA indexes that, but now that we are using the IPT we cannot deliver that anymore.

I think this is hardly a unique use case and is similar to where @mdoering came from when making the proposal all these years ago.

nielsklazenga commented 3 years ago

@tucotuco The examples are in line with the definition, which literally has 'typified scientific name' in it. typeStatus is basically a free text property and does not take a vocabulary.

I actually think this definition is fine (and so are the examples). I just brought it up, because it does not strictly allow the usage @deepreef advocates and claims obviates the need for a typifiedName property.

Regardless of all this, there is a very strong use case for adding a typifiedName property to Darwin Core. I do not support a change to the definition of typeStatus. It might be nice to have a typeOfType (or something like that) property as well, but the type of type is a way less important part of the type status than the typified name and can be easily parsed from the current typeStatus, so the need for that is not that great.

So I would like to propose an option 6:

deepreef commented 3 years ago

@nielsklazenga : OK, we can defer the "meat" of this discussion to the TCS group. However:

Just add the typifiedName property and leave all other properties for other proposals.

That was my # 3 option, which I identified as the least of evils for a quick fix.

nielsklazenga commented 3 years ago

@deepreef , thanks, I overlooked that. Scrap option # 6.

deepreef commented 3 years ago

@tucotuco :

Option # 5 is by far the easiest if the definition does not change. Examples are not normative and require only Darwin Core Maintenance Group vetting rather than a public review.

I seem to be the only one who thinks that the change in "Examples" for typeStatus to its current value (conflating typestatus and typifiedName was a bad thing -- so maybe it's not right to change even the non-normative documentation in this case. of course, if # 3 is the option taken, and typifiedName is added as a separate term, then perhaps we can revert typeStatus to its original intended definition. We do not need a typeOfType term -- that was the original purpose for typeStatus. I'm assuming that nobody wants both the addition of a typifiedName term and maintaining the current definition of typeStatus, which includes "type status, typified scientific name, publication". Once typifiedName is adopted, then typeStatus can revert to being simply "type status"... does everyone agree?

nielsklazenga commented 3 years ago

If you want to deliver/publish/do-anything-else-with just the kind of type, you are going to need typeOfType, as changing the definition of typeStatus will break Darwin Core for people who currently use the term correctly and the current definition is correct for 'type status'.

I have been delivering dwc:typeStatus since 2011 at which time the definition was the same as it is now. I can also not find any evidence that the definition has changed in the Normative Term List, which supposedly has the complete history. Also, both TCS and the TDWG Taxon Name LSID Vocabulary have 'typeOfType' and ABCD (from memory) has 'TypeStatusName'. HISPID, I think, may have had 'typestat' for the category of type, but that was a very long time ago.

The misconception about the meaning of typeStatus is not only @deepreef's, but is very widespread. For example, there is a dwciri:typeStatus term as well, which I think only makes sense in the more restrictive meaning of the category of type. Also, ALA, for their "processed" typeStatus, cuts off everything except the first word from my carefully constructed typeStatus strings. Having typeOfType alongside typeStatus will go a long way to resolve the misappropriation of typeStatus [I think it is actually not always a misconception as the need to have the type category, Darwin Core lacking a term for that and typeStatus being the closest thing].

For my own self-indulgence, I prefer to deliver typeStatus, as I can put in there everything that I think a typification should have. It is just that I found that other providers have a hard time delivering it and consumers cannot really do anything with it other than displaying it or passing it on. So I think there is value in having terms for typeStatus, as well as its minimum components, typifiedName and typeOfType and I would support (even write if there is more interest) a proposal to add typeOfType to Darwin Core as well.

We are going to have this in TCS as well, but, while we aim to have a strawman ready by TDWG 2021 in September, we have seen (also in this proposal) that writing good definitions (and getting agreement on something) can be really hard, so we cannot really say how long it will take after that before TCS 2 will be ready for use. Also, in TCS these properties will be object (or URI) properties, while in Darwin Core they are literals. Having the literals in Darwin Core saves us from having to add them to TCS and will make TCS a much cleaner standard. We can add typifiedName (the URI property) to TCS to support Darwin Core RDF.

deepreef commented 3 years ago

@nielsklazenga : OK, I didn't realize it's been that long since I looked at the definition of typeStatus. Evidently it changed to it's current (overloaded, in my opinion) definition sometime before 2011 -- which means I've missed this for at least ten years. I'll concede that a decade is enough time that the new definition is now the definition adopted by a majority of content providers, making me and my ilk the outliers. I think is extremely unfortunate (not because I'm an outlier, but because I think it was a mistake to overload the term); but I confess that it is what it is.

In any case, we do seem to agree that the current DwC terms are collectively inadequate. I tend to agree with @nielsklazenga that a band-aid/short-term fix now will probably be wise, rather than wait for the robust solution through TCS2.

Did we agree that the DwC term for typificationStatus would be included in the Identification class?

nielsklazenga commented 3 years ago

@deepreef:

Did we agree that the DwC term for typificationStatus would be included in the Identification class?

We (you and I at least) agree that typeStatus should be left in the Identification class where it is now.

deepreef commented 3 years ago

Oops! I meant typifiedName. I had assumed that typeStatus would remain within Identification. And yes, I knew you and I had agreed -- and I think @mdoering as well -- but not sure if everyone else agreed.

nielsklazenga commented 3 years ago

I think typifiedName should be where typeStatus is.

tucotuco commented 3 years ago

Sorry Rich, you are 14 years out of date. The current semantics went into place in 2007 ( http://rs.tdwg.org/dwc/curatorial/version/TypeStatus-2007-04-17.htm). The semantics you remember were last used in the 2003 version of Darwin Core.

On Mon, Mar 29, 2021 at 10:10 PM Richard L. Pyle @.***> wrote:

@nielsklazenga https://github.com/nielsklazenga : OK, I didn't realize it's been that long since I looked at the definition of typeStatus. Evidently it changed to it's current (overloaded, in my opinion) definition sometime before 2011 -- which means I've missed this for at least ten years. I'll concede that a decade is enough time that the new definition is now the definition adopted by a majority of content providers, making me and my ilk the outliers. I think is extremely unfortunate (not because I'm an outlier, but because I think it was a mistake to overload the term); but I confess that it is what it is.

In any case, we do seem to agree that the current DwC terms are collectively inadequate. I tend to agree with @nielsklazenga https://github.com/nielsklazenga that a band-aid/short-term fix now will probably be wise, rather than wait for the robust solution through TCS2.

Did we agree that the DwC term for typificationStatus would be included in the Identification class?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tdwg/dwc/issues/28#issuecomment-809828980, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADQ726IPSBAKBSLWILK4STTGEQJZANCNFSM4AXKVP7Q .

nielsklazenga commented 3 years ago

So, @tucotuco , was the reasoning at the time that Darwin Core did not need a more atomised typification like TCS and ABCD have, which are from around the same time or slightly before?

tucotuco commented 3 years ago

@nielsklazenga https://github.com/nielsklazenga No, I don't think that exact reason came into play. Rather, what was important at the time was exemplified in the typeStatus term as defined then, and as accepted under ratification of the standard two years later. A typifiedName term didn't get created because the community did not make a request for it before or during the public review and one of the principles of Darwin Core is to not add anything that isn't demanded. And now that is embodied in section 3.1 Justifications for change in the Vocabulary Maintenance Specification ( https://github.com/tdwg/vocab/blob/master/vms/maintenance-specification.md).

On Mon, Mar 29, 2021 at 11:06 PM Niels Klazenga @.***> wrote:

So, @tucotuco https://github.com/tucotuco , was the reasoning at the time that Darwin Core did not need a more atomised typification like TCS and ABCD have, which are from around the same time or slightly before?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tdwg/dwc/issues/28#issuecomment-809849577, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADQ723NB3MHUEQXCJKQUVTTGEW3PANCNFSM4AXKVP7Q .

deepreef commented 3 years ago

@tucotuco Damn... time does fly! I guess it's one of those changes that I somehow missed when it happened, then never had occasion to notice until it came up (as it did just now). I would have fought like hell against its current definition had I been paying attention, and would have strongly supported the creation of typifiedName and typificationPublication as separate terms back then in 2007, to avoid overloading typeStatus. The only reason I'm reluctant now is that we've managed to survive this far (14 years!!! Egads!) -- maybe we can just hang in there with the status quo for a couple more years until we get it "right" with TCS. Of course, back then I probably would have been saying "but everyone will be using TCS1 anyway, so why bother". sigh

In any case, thanks to both of you for setting me straight. And sorry for my rants -- which I now realize are 14 years behind the times.

But....

A typifiedName term didn't get created because the community did not make a request for it before or during the public review

Evidently there was some sort of demand to expand the meaning of the term typeStatus to include typified name and publication information. I mean, why expand the definition to include that content if nobody asked for it? And if they asked for it, I wonder why no one suggested adding new terms instead? All moot now, I guess.

nielsklazenga commented 3 years ago

@deepreef, you are expecting too much from TCS 2. Darwin Core and TCS have different scopes and purposes and TCS can never completely take over all the taxonomy- and nomenclature-related terms from Darwin Core. I expect most of the nomenclatural type data that is exchanged will always come from collections databases rather than taxonomic or nomenclatural systems, just like it does now, so will be done with Darwin Core. Also, in Darwin Core, like in collections databases, the terms will be literals, while in TCS, as in taxonomic systems I would hope, they will be objects (or URI). Having the terms in Darwin Core will save us from having to define the literal version of the terms in TCS. On the other hand, if we think there is a need for a literal term in TCS, we can borrow it from Darwin Core, if that already has it. We just have to make sure that the definitions we write and the names we choose for the Darwin Core terms are the same as the definitions we want in TCS.

On the other hand, for terms in the Taxon class, like acceptedNameUsage and originalNameUsage, which @qgroom brought up earlier, definitely wait for TCS 2, as TCS 2 will do that much better and I hope it will be recommended that TCS 2 is used instead of the Taxon Core (and we could propose to have those terms removed). The same goes for anything in the dwciri namespace, as that TCS 2 can take over.

I was just thinking we need a typificationPublishedIn property in TCS. TCS 1 has 'LectotypePublication', but I think we should have something more general (if only so that we do not have to add neotypePublishedIn and conservedTypePublishedIn).

deepreef commented 3 years ago

@nielsklazenga :

TCS can never completely take over all the taxonomy- and nomenclature-related terms from Darwin Core

All the important DwC Taxon terms are already incorporated (and then some). Sure, we probably won't bother with the higher taxonomy-as-separate terms bit, but those aren't important in the long run. The only part I was dubious about including within TCS 2 was the typification stuff, but then you indicated above that it would be included in TCS 2. Maybe you're the one expecting too much from TCS 2? In any case, that's a discussion for another place/time; not here.

I was just thinking we need a typificationPublishedIn property in TCS. TCS 1 has 'LectotypePublication', but I think we should have something more general (if only so that we do not have to add neotypePublishedIn and conservedTypePublishedIn).

If you're ambitious enough to deal with typification properties in TCS 2, then I fully agree. I'm just not sure we'll get that far. I guess it depends on how much of our apparent disagreements on typification are actually agreements disguised in miscommunication. If it's anything like the majority of taxonomic data modelling discussions, I suspect that most of it will be that.

nielsklazenga commented 3 years ago

You are missing my point. tcs:typifiedName and dwc:typifiedName are (would be) different properties, even if their written definitions are the same, as the former will have a TaxonName object (or its URI) as its target and the latter a string. You cannot use tcs:typifiedName in a Darwin Core Archive, which is currently – and will be for a long time to come – how most nomenclatural type data is exchanged.

This sort of thing is why Darwin Core has all these extra ...ID properties in the Taxon class and why it now has the dwciri namespace for (mostly vocabulary) properties that take a string in the dwc namespace, but need to take a URI in RDF, as well as some new properties. TCS is like dwciri (in the limited context of nomenclatural types). If we were to have all nomenclatural type stuff dealt with in TCS, we would need to have to define terms like tcs:typeOfTypeLiteral and tcs:typeSpecimenLiteral (we probably need to define that anyway as that cannot be in Darwin Core). I would rather just borrow the dwc terms. If a term we are going to use in TCS is exactly the same as one that is already in Darwin Core, we should use the Darwin Core term in any case.

RRabeler commented 3 years ago

Thanks all for the spirited discussion! I didn't think my query would cause such a stir, but I am glad to see that it did. I'm way behind in following the points that emerged - will try to catch up.

deepreef commented 3 years ago

@nielsklazenga : actually no -- I understood what your proposal was for the difference between tcs:typifiedName being different from dwc:typifiedName (URI vs. literal). But that is just your proposal. We haven't had that discussion yet, so that decision has not yet been made. But as I said, this is not the place or time to have that discussion.