plazi / treatmentBank

Repository devoted to house keeping of treatmentBank
0 stars 0 forks source link

Journal of Hymenoptera Research: Zenodo ID in stats, treatments in BLR #84

Open myrmoteras opened 1 year ago

myrmoteras commented 1 year ago

@gsautter what can we do that the Zenodo ID for Journal of Hymenoptera Research show up in the Plazi stats? Also what are the rule that the treatments show up in BLR?

https://tb.plazi.org/GgServer/dioStats/stats?outputFields=doc.articleUuid+doc.doi+doc.zooBankId+doc.gbifId+doc.zenodoDepId+bib.year+bib.source&groupingFields=doc.articleUuid+doc.doi+doc.zooBankId+doc.gbifId+doc.zenodoDepId+bib.year+bib.source&FP-bib.source=%22Journal%20of%20Hymenoptera%20Research%22&format=HTML

gsautter commented 1 year ago

Looks as though the articles that do have a Zenodo ID in the TaxPub, but not in out system, were imported within a few days of publication, and not back-synchronized with the TaxPub version later (i.e., after the Zenodo ID was added to the latter) ... the ones that do have a Zenodo ID in our system were either imported a good while after original publication, or back-synchronized after I built that mechanism in late 2019 ... said back-synchronization doesn't happen automatically at this point, but I triggered it for the articles in question, and most of them have their Zenodo ID now. The reason back-synchronization doesn't happen automatically is that it may apparently take quite a while for the TaxPubs to get Zenodo IDs embedded in them ... the last JHR volume to have Zenodo IDs is 93 (published and first imported 2022-10-31), and they are not present in all of the volume 91 and 93 articles. The three volumes since don't have Zenodo IDs yet ... this is too long a time span to effectively work with automated scheduling, I'm afraid ... while our scheduling component can technically handle such time frames no problem, I think there's just too much variation to both get the Zenodo IDs in a timely fashion and be sure to actually get them ... unless we do another lookup every other week until we have the Zenodo ID, but that would cause considerable effort.

Another share of the articles, especially ones we imported before mid 2019, are most likely duplicates from the time Pensoft didn't mint UUIDs yet and we had to use the URL for de-duplication ... and when they changed the domain name, that ended up causing a good bunch of duplicates ... we renamed the treatments in those to treatmentDuplicate to get them out of SRS, and back-synchronization ignores those documents ... the table below is sorted by DOI so you can spot the duplicates.

Added a few columns to your table for illustration: https://tb.plazi.org/GgServer/dioStats/stats?outputFields=doc.articleUuid+doc.doi+doc.zooBankId+doc.gbifId+doc.zenodoDepId+doc.uploadUser+doc.uploadDate+doc.updateUser+doc.updateDate+bib.pubDate+bib.year+bib.source+cont.pageCount+cont.treatCount&groupingFields=doc.articleUuid+doc.doi+doc.zooBankId+doc.gbifId+doc.zenodoDepId+doc.uploadUser+doc.uploadDate+doc.updateUser+doc.updateDate+bib.pubDate+bib.year+bib.source+cont.pageCount+cont.treatCount&orderingFields=doc.doi&FP-bib.source=%22Journal%20of%20Hymenoptera%20Research%22&format=HTML

myrmoteras commented 1 year ago

image

image

1 duplicate

2 duplicate

3 master article whose treatments are kept in TB