plazi / community

This repo is intended to serve as a help desk for TreatmentBank-users.
6 stars 1 forks source link

Missing information on the classification provided in dataset 9543 #237

Closed DianRHR closed 4 weeks ago

DianRHR commented 1 year ago

I was trying to use the information of the article:

Bezdċk, J., & Regalin, R. (2022). Identity of species-group taxa of the Western Palaearctic Clytrini (Coleoptera: Chrysomelidae) described by Maurice Pic and Louis Kocher (Version 1657327952032). Plazi.org taxonomic treatments database. https://doi.org/10.5281/zenodo.4272771

available in ChecklistBank and found that even that the title of the article (and the focus of it) is the tribe Clytrini, this txon and rank is not included in the dtaset. All the genus are directly under Chrysomelidae.

Could you consider including in the datasets all the taxonomic ranks mentioned in the article?

The textree file looks like this: image

jugiora commented 1 year ago

Dear Dian We have added the tribe information to taxa attributes as mentioned. There was a lack of this information due to the usual parse applied to the taxonomic data extracted, but it can be added in specific cases as this. Cheers. Julia

mdoering commented 1 year ago

Thanks @jugiora. The DwC archive does not yet contain the tribe. Does a regeneration need manual triggering?

Not also that the genus Stephenympha is wrongly given as a plant.

jugiora commented 1 year ago

Dear Dian. The taxonomic attributes were all fixed. The information should be also updated in DwC in a few hours. All the best. Julia

mdoering commented 1 year ago

The dwca still contains plants right now:

03EC879FFFB1FFCEA875AF4BFDF21428.taxon Plantae Tracheophyta Liliopsida Poales Poaceae Stephenympha genus Stephenympha Stephenympha Stephenympha https://treatment.plazi.org/id/03EC879FFFB1FFCEA875AF4BFDF21428

@gsautter does it take longer to update?

myrmoteras commented 1 year ago

@mdoering no, this has still be a plant, but is fixed. May be there is a way to filter out all taxononomic names in the nomenclature section to check, that those all are leps.

We might want to set this article also aside, since each treatment is at genus level, but in fact includes a list of species, often with new combinations, such as in Modica and as well synonyms, which might be relevant for ChecklistBank / COL.

see also https://github.com/plazi/Plazi-Communications/issues/1269

DianRHR commented 1 year ago

@jugiora thanks for your quick anser, however, I downloaded again the dwca from Checklist bank https://www.dev.checklistbank.org/dataset/9543/download and the tribe is not yet included. My question is if these kind of issues are addressed manually?

flsimoes commented 1 year ago

@jugiora thanks for your quick anser, however, I downloaded again the dwca from Checklist bank https://www.dev.checklistbank.org/dataset/9543/download and the tribe is not yet included. My question is if these kind of issues are addressed manually?

Perhaps ChecklistBank hasn't yet gotten the most updated version.

What sort of issues exactly do you mean? Fixing the taxonomy? Then yes, it is fixed manually, as @jugiora did this time. If you are talking about the update to the DwCA, it should be automatic once we fix things on our end (I think checklistbank only imports the datasets once a day though)... @myrmoteras anything to add?

gsautter commented 1 year ago

Judging from https://www.gbif.org/occurrence/search?offset=0&limit=500&dataset_key=bfb878f3-8a74-46d3-a104-36485c32aaba , the datset is updated in GBIF by now ... hard to tell how long an update takes to get to CLB from there at this point ... @mdoering is there a synchronization schedule in place, or some sort of notification based system? Would be great to have an approximate time it usually takes such updates to go through, so we know at what point we should start to worry or investigate.

mdoering commented 1 year ago

I nothing is triggered the system checks weekly by default for an update. You could trigger a CLB import from your end each time an archive is rebuild to make sure there is no latency. Its a simple POST call to the API, we would just need to arrange appropriate credentials

gsautter commented 1 year ago

I nothing is triggered the system checks weekly by default for an update. You could trigger a CLB import from your end each time an archive is rebuild to make sure there is no latency. Its a simple POST call to the API, we would just need to arrange appropriate credentials

Easy enough to send a similar poke request to CLB as we send to the GBIF API when a DwCA gets updated ... however, GBIF might pull the updated DwCA with some latency, so there would be a non-negligible risk of CLB fetching the old version of the data from GBIF before GBIF fetches the new version from TB ... needs some thought.

mdoering commented 1 year ago

CLB does not fetch anything from GBIF. We poll your files directly

gsautter commented 1 year ago

I fee like this issue is related, as both concern uplink and sending notifications to CLB: https://github.com/plazi/treatmentBank/issues/90

DianRHR commented 1 year ago

o there would be a non-negligible risk of CLB fetching the old version

@flsimoes I mean both. fixing taxonomy: include a name and taxon rank that are only mentioned in the title , but are important part of the classification (tribe in this case) and that are not considered in DwC. updating to the DwCA: the discussion in the previous comments. And my question in mainly in order to know how to proceed once we find missing information on a dataset.

@gsautter I'm affraid we are talking about two different datasets: I mentioned https://www.dev.checklistbank.org/dataset/9543/about which also is: https://www.gbif.org/dataset/77c874cd-4f85-4746-8466-3ca09e2c2b8d and just checked both and the tribe is not yet included. The one you mentioned is a different dataset: https://www.gbif.org/occurrence/search?offset=0&limit=500&dataset_key=bfb878f3-8a74-46d3-a104-36485c32aaba

gsautter commented 1 year ago

@gsautter I'm affraid we are talking about two different datasets: I mentioned https://www.dev.checklistbank.org/dataset/9543/about which also is: https://www.gbif.org/dataset/77c874cd-4f85-4746-8466-3ca09e2c2b8d and just checked both and the tribe is not yet included. The one you mentioned is a different dataset: https://www.gbif.org/occurrence/search?offset=0&limit=500&dataset_key=bfb878f3-8a74-46d3-a104-36485c32aaba

At the dataset level, sure, but at the level of figuring out how to get updates into CLB more quickly and how to get CLB dataset keys into TreatmentBank, they are both about communication between the two systems, and that is something we might well and most likely should discuss in conjunction, as it boils down to adding a CLB communication component to the TreatmentBank back-end server. Never meant to say the specific dataset issues don't need to be solved individually.

mdoering commented 1 year ago

As far as I can see the dwca from Plazi still does not contain the Clytrini tribe. @gsautter I think I now know why. The classification is not provided via parentNameUsageID, but only as flat, major linnean ranks. And tribe is not included in there:

<field index="3" term="http://rs.tdwg.org/dwc/terms/parentNameUsageID"/> <!-- blank -->
<field index="4" term="http://rs.tdwg.org/dwc/terms/originalNameUsageID"/> <!-- blank -->
<field index="5" term="http://rs.tdwg.org/dwc/terms/kingdom"/> <!-- taxon@kingdom -->
<field index="6" term="http://rs.tdwg.org/dwc/terms/phylum"/> <!-- taxon@phylum -->
<field index="7" term="http://rs.tdwg.org/dwc/terms/class"/> <!-- taxon@class -->
<field index="8" term="http://rs.tdwg.org/dwc/terms/order"/> <!-- taxon@order -->
<field index="9" term="http://rs.tdwg.org/dwc/terms/family"/> <!-- taxon@family -->
<field index="10" term="http://rs.tdwg.org/dwc/terms/genus"/> <!-- taxon@genus -->
<field index="11" term="http://rs.tdwg.org/dwc/terms/taxonRank"/> <!-- taxon@rank -->
<field index="12" term="http://rs.tdwg.org/dwc/terms/scientificName"/> <!-- reconciled taxon name with reconciled authority, with parentheses and all -->

Ideally we would use parentNameUsageID only - at least if you have a parent child relationship in your model. Otherwise there are new dwc classification terms on the way we can use including tribe, subtribe and superfamily to get at least somewhat richer trees:

https://github.com/tdwg/dwc/issues/45 https://github.com/tdwg/dwc/issues/46 https://github.com/tdwg/dwc/issues/65

gsautter commented 1 year ago

@gsautter I think I now know why. The classification is not provided via parentNameUsageID, but only as flat, major linnean ranks. And tribe is not included in there:

That's correct ... a tribe will only be there if the taxon actually is of rank tribe ... we don't generally store the intermediate ranks internally, either, as there is simply too many of them, and for a long time DwC didn't really support them, either.

The question that still remains open is the handling of updates.

mdoering commented 1 year ago

So that means the dwca is up to date and adding the tribe did not change anything, correct?

gsautter commented 1 year ago

Regarding the tribe, I think so ... but there also was that "Plantae" vs. "Animalia" cleanup, if I remember correctly ... has the latter come through?

mdoering commented 1 year ago

Yes, it is fixed in TB: https://treatment.plazi.org/id/03EC879FFFB1FFCEA875AF4BFDF21428 and also CLB: https://www.checklistbank.org/dataset/58039/taxon/03EC879FFFB1FFCEA875AF4BFDF21428.taxon

mdoering commented 1 year ago

Most genera in that dataset have an authorship, but a few don't: https://www.checklistbank.org/dataset/58039/names?facet=rank&facet=issue&facet=status&facet=nomStatus&facet=nameType&facet=field&facet=authorship&facet=authorshipYear&facet=extinct&facet=environment&facet=origin&limit=50&offset=0&rank=genus&sortBy=taxonomic

The authorship Hübner, 1818 is in the treatment though: https://treatment.plazi.org/id/03EC879FFF89FFC5A875AEDBFCAA162E

gsautter commented 1 year ago

The authorship Hübner, 1818 is in the treatment though: https://treatment.plazi.org/id/03EC879FFF89FFC5A875AEDBFCAA162E

I tend to think adding authorityName and authorityYear as well should do the trick ... authority normally is the verbatim authority, as given in the annotated taxon name, with any interpretation (e.g. expansion of abbreviations, or adding the document author(s) and year in case of original descriptions) going to the aforementioned two detail attributes.

gsautter commented 1 year ago

The authorship Hübner, 1818 is in the treatment though: https://treatment.plazi.org/id/03EC879FFF89FFC5A875AEDBFCAA162E

I tend to think adding authorityName and authorityYear as well should do the trick ... authority normally is the verbatim authority, as given in the annotated taxon name, with any interpretation (e.g. expansion of abbreviations, or adding the document author(s) and year in case of original descriptions) going to the aforementioned two detail attributes.

Turns out adding the two detail attributes did do the trick.