plazi / arcadia-project

2 stars 1 forks source link

TreatmentBank collectionCode tagging and providing unique collection code: policy #145

Open myrmoteras opened 4 years ago

myrmoteras commented 4 years ago

Issue

Currently, collectionCodes come as individuals, as strings and as false positives in treatmentBank: http://tb.plazi.org/GgServer/srsStats/stats?outputFields=matCit.collectionCode&groupingFields=matCit.collectionCode&limit=100&format=HTML

For the BLR website, and in fact anybody interested to be able to find stats, data about a particular collectionCode, individual values are needed.

For taxonomicTreatments in BLR-Zenodo this is an issues too, especially if we would make us of the link to the GBSciol identifier in the custom metada that is being added to the ColletionCodes in the markup process.

Ideally TB is providing those.

However, we have a 14854 unique collectionCodes in TB. The source is to a large degree the production of false positives in the batch processing of Zootaxa.

For the production of the 70K and more treaments we rely on machine processing. The liberation of treatments from the PDF prison is, not to be forgotten, already a decisive huge step.

What is the suggested solution here?

  1. We continue as is and communicate this fact, and that it is up to the user to clean it up, whilst we in the midterm try to clean it up 2, We make an effort now to find a solution
  2. Other(s)?

Decision