tdwg / tnc

Taxonomic Names and Concepts Interest Group
22 stars 7 forks source link

Vernacular names #47

Closed nielsklazenga closed 4 years ago

nielsklazenga commented 4 years ago

We discussed vernacular names in our teleconference of 24 September 2019. At the time we introduced a vernacularName property on the TaxonomicNameUsage object and a VernacularName class that would be its object. The VernacularName class would contain taxonomicNameUsage, language and preferredName properties. However, somehow, in the diagram I presented at Biodiversity_Next, the VernacularName object had a taxonomicName instead of a taxonomicNameUsage property.

I can't remember whether I made a mistake in the minutes of the meeting or in the diagram, but I now think it would be best to not have a VernacularName class at all and have a TaxonomicNameUsage as the object for the vernacularName property. We can add the preferredName flag to the TaxonomicNameUsage class (maybe rename it to isPreferredVernacularName); I had already put dcterms:language in the TaxonomicName class.

I think other people at the meeting might have been here already at the meeting, but what are people's thoughts?

I think this keeps us very close to TCS, where Vernacular Name was a Taxon Relationship Type. In fact, one can deal with what users of "scientific names" would call "vernacular names" by way of Taxon Relationship Assertions, which might be more appropriate in, for example, ethnobiological studies, or for people who don't want to use scientific names themselves, but still want to refer to vernacular name usages by others.

We could also include a scientificName property in the TaxonomicNameUsage class for usages where the taxonomicName is a vernacular name (like in field guides).

deepreef commented 4 years ago

One approach several of us discussed recently at the CoL Global Team meeting in Canberra is the idea of treating Vernacular Names just like we do with any other TaxonNameUsage. There are some complexities doing it this way (e.g., no Protonym anchoring), but it means we don't need any vernacularName properties at all. We just flag them accordingly in some way (multiple options for doing this). That way, as you note, relationships between vernacular names and scientific names could be managed the same way as other Taxon Relationship Assertions.

I need to spend some time reminding myself of where we are with TNC, but based on discussions we had in Canberra recently, I think there are some subtle but important changes we can make. More on this a bit later...

nielsklazenga commented 4 years ago

I'll take this as agreement that we shouldn't introduce a VernacularName object.

What you describe is probably the best approach for applications that aggregate data from multiple disparate sources. The (Australian) National Species Lists (NSL) has used it from the beginning (and the Australian National Plant Index (APNI) probably for much longer). There is, however, a difference between a standard and an application – just like within an application like the Catalogue of Life or NSL there will be a difference between how the data is stored internally and how it is disseminated (or ingested). Treating vernacular names as "any other TaxonomicNameUsage" requires a subtype on the TaxonomicNameUsage and a vocabulary for that, which I am very keen to keep out of the standard, because otherwise we'll never finish it. I think this is more something for an application profile.

Also, one of the things we did last year is restrict the types of relationships that can be used in TaxonRelationshipAssertions, so we now only have the RCC-5 relationships left. That's why we have kept calling it Taxon Relationship Assertions and not TNU Relationship Assertions. For example, 'Synonym' was removed as a relationship type and replaced with acceptedNameUsage (adopted from Darwin Core) on the TNU. We have done something analogous with vernacular names (you could say we also took that from Darwin Core). Again, applications can do what works best for them, as long as they can ingest and exchange standard data.

I am pretty keen to keep the vernacularName property (or have vernacularNameUsage and scientificNameUsage properties).

deepreef commented 4 years ago

Yes, I think a VernacularName object would be overkill.

My feeling was the same as yours until I really thought about it. Fundamentally, we're talking about text strings used as labels to represent implied taxon concepts. Scientific names may have additional properties (e.g., prescribed by Codes, Protonym anchoring, typification, etc.), but conceptually they have the same fundamental implications and implied relationships, etc. I was under the impression that we already had different types of TaxonNameUsages, and it would be easy to delineate vernaculars using a defined value for one of the existing properties, so I don't see the need for any special "Subtype" in this case. If anything, I think it simplifies the standard considerably, and would require fewer properties, not more.

But I'm on the fence, and I've decided that the pros and cons just about balance each other out, so I'll defer to others on this. As long as we acknowledge accommodate the fact that one scientificName TNU can map to multiple vernaculars (and vice versa), I'm sure people will make it work either way.

I fully agree on the RCC-5 relationships approach, and that would apply equally well to taxa labelled with scientificNames as for taxa labelled with vernaculars.

In any case, I'm OK with either approach -- but agree that we don't need to define a whole object for vernaculars.

nielsklazenga commented 4 years ago

Hi @deepreef, I think we don't disagree at all. I think vernacular name usages should be treated the same as any other Taxonomic Name Usage and that the distinction between a Vernacular Name Usage and a Scientific Name Usage is not so much in how the name string is formatted, but in how they are being used (or in their relationships with other TNUs).

Also, I think that, no matter how we define them, all properties that represent relationships between TNUs (acceptedNameUsage, parentNameUsage etc.) should be able to be used as Relationship Types and vice versa – so you could have isCongruentTo, includes etc. properties on a TNU (in an application) as well if you don't need the accordingTo.

mdoering commented 4 years ago

I am not on top of the discussion but would throw in that vernacular names surely need other associated properties particular a language/locale/country. They are also sometimes related to sex and lifestage.

I can see the point to treat everything as usages that have some sort of label. But thats very generic and It seems we are going down a subclassing route here.

deepreef commented 4 years ago

Agreed! Actually, scientific names should also have language (e.g., some rules in the zoological Code depend on the language of the name, such as latin vs. greek, or of German origin). In any case, I don't have strong feelings one way or another, except that vernacular names need to be tied to usages, rather than naked scientificName instances.

nielsklazenga commented 4 years ago

Before I opened this issue, we had a VernacularName class with the properties taxonomicName (which is now clear should be taxonomicNameUsage), isPreferredName and geographicArea. I had put the language property in the TaxonomicName class.

Because of the one-to-many (or many-to-many) relationship between Scientific Name Usage and Vernacular Name Usage, if you want to deal with this as tabular data, you are going to need an association table, which you would probably call 'vernacular_names' or something like that and which might have the ids of the Scientific Name TNU and Vernacular Name TNU and the other two properties I listed. On the other hand, in JSON or XML format, where you can have repeatable values, you might just want to use TNUs as the object.

So, in the standard, we can either put all the properties in the TaxonomicNameUsage class and say one can also use them in a separate VernacularName object, or we can define a VernacularName class and say that all properties in it can also be used in a TNU object. I also don't have strong feelings either way, but now tend towards defining the VernacularName (or perhaps better VernacularNameUsage) class. If we do this, we do not need a verbatimName property on the TNU.

So it would be something like this:

VernacularNameUsage

(I have the feeling that language and geographicArea are still better defined in TaxonomicName or TaxonomicNameUsage)

In an application this, of course, can be converted very easily to a Taxon Relationship Assertion (or TNU Relationship Assertion rather), but, for the standard I would like to keep it separate. That way we can also make the accordingTo required in the TaxonRelationshipAssertion class, while it is not necessary for vernacular name usages.

deepreef commented 4 years ago

I will handle the live data model differently from how you describe, but as you have noted before, we're not trying to develop the perfect data model, we're trying to define an optimal data exchange standard. As such, I think it's appropriate to maintain some cross-links as properties within classes like TaxonomicNameUsage, and make use of a TaxonRelationshipAssertion structure to handle the myriad other cross-links among TNUs.

I'm uneasy about developing a VernacularName/VernacularNameUsage class at this stage, because I think this is of secondary priority within our scope. We certainly need to accommodate vernacular names in the exchange schema, but I'm reluctant to invest too much time & energy defining the properties for that class until we've finished the other classes with more direct relevance.

Side note: Emerging from the CoL meeting was a renewed interest on creating a workflow to generate "clean" data buckets for literature and for agents (specifically agents as authors of literature). We may want to farm that out to an entirely separate group, but every time we've done that in the past it gets neglected and/or abandoned. Because Literature/Agents are of such fundamental importance to TNU-space, I think an argument could be made to include it within the scope of this effort. But another argument could be made to spin it off as a separate discussion group. I favor the former, but in any case, this will require minimally a set of separate issues. (i.e., we don't want to dwell on it here). I'll leave it to someone else to create the appropriate issue, as I'm still a GitHub noob.

jar398 commented 4 years ago

On 3/15/20 10:07 PM, Richard L. Pyle wrote:

TaxonRelationshipAssertion

Aside:

These are relationships between TNUs, not taxa, so a "TaxonSomething" name is not appropriate.

There are already similar infelicities out there, such as taxonID, but propagating the confusion does not help.

nielsklazenga commented 4 years ago

@deepreef: That works for me too.

Regarding literature and agents, I agree on how essential they are to us and am happy for the TNC to take that on, but I don't want it to delay the publication of the draft standard (for which we are aiming for September). So I will open the GitHub issue and we might dedicate a TNC meeting to it, where we can see how much time it is likely to take and whether we can include it in our current work, or do it afterwards. I know that there is some work going on within TDWG on at least agents.

nielsklazenga commented 4 years ago

Thanks @jar, "TaxonRelationshipAssertions" is just a working name that comes straight out of TCS. TaxonomicNameUsageAssertions seems more appropriate, but my issue with that is that we've got other relationships between TNUs in the standard that are not in this class, so I would like to qualify it somehow. But yes, we need to think of a more appropriate name for the class. I will create a new issue for this.

deepreef commented 4 years ago

Aside: These are relationships between TNUs, not taxa, so a "TaxonSomething" name is not appropriate. There are already similar infelicities out there, such as taxonID, but propagating the confusion does not help.

I guess that depends on how you define a TNU. I personally feel that a TNU instance is (by far) the best data object to use as a representation of a taxon concept -- especially in the context of an instance of a Taxon[omic]RelationshipAssertion. I know others wish to create a separate object for "TaxonConcept", but I have yet to see any actual implementation of such (other than using a TNU instance as a representation of a taxon concept) that doesn't create as many problems as it solves. But I'm certainly keeping an open mind on this! And I've almost been persuaded once or twice.

Now, keep in mind that only a subset of TNU instances can effectively serve as representations of Taxon Concepts, so in that sense I support "TaxonomicNameUsage" as a term. But I would argue that any TNU instance that participates in a Taxon[omic]RelationshipAssertion instance is, by definition drawn from among the subset of TNUs that represent concepts. Therefore, I think that the term TaxonRelationshipAssertion is entirely appropriate.

Of course, reasonable minds may disagree... :-)

deepreef commented 4 years ago

@nielsklazenga : I agree that the References/Agents components should come after we've locked in a decent draft of the Taxon Names/Usages/etc. standard.

A group is forming consisting of CoL-Folk, PLAZI-Folk, WoRMS-Folk, ZooBank-Folk, and other interested parties. A lot of that discussion will happen within the Bibliography of Life space (managed by PLAZI), but that will be more about workflow and content generation/cleanup. I think the standards discussion should take place within this TNC space.