plazi / ggxml2taxpub

Conversion of GoldenGATE XML to JATS/TaxPub at treatment level
0 stars 1 forks source link

create a test body of Candollea treatments as TaxPub level-1 #27

Open myrmoteras opened 2 years ago

myrmoteras commented 2 years ago

@gsautter please

  1. @myrmoteras select ca 50 treatments from Candollea as an example of the Swiss collection.
  2. create a corpus of Candollea.
  3. Talk to @flsimoes to do the QC
  4. Push it to ?? @tcatapano where? in this repo so @tcatapano can check and if no changes are needed in processing to push to https://github.com/plazi/ggxml2taxpub-treatments/tree/main/level1
  5. @tcatapano notfiy @jgboeil

Would be helpful if this closed by April 15 to be ready for a presentation at the Botanical Garden Geneva

flsimoes commented 2 years ago

We currently have 272 Candollea papers on TB from 2012 onwards Of these, 129 have already been adapted to the high-level of granularity (and QCd). The remaining papers are in progress or not started. We already found 4 duplicates which were promptly reported to Guido.

tcatapano commented 2 years ago

@myrmoteras As per #31: please upload sample ggxml treatments to a new subdirectory (name it "candollea") the ggxml/ directory in this repository.

myrmoteras commented 2 years ago

Dear Michelle and Martin

We are starting to include Candollea treatments into SIB. As a test we need ca 100 treatments, and if possible some that are relevant to CJB, because they described specimens and have specimens from CJB, form corpus that is relevant for ongoing research of for another reason.

Having the treatments in SIB will allow making use of their text and data mining facilities.

Here is the link to the articles we so far processed, a processed finished soon

https://tb.plazi.org/GgServer/dioStats/stats?outputFields=doc.articleUuid+doc.doi+doc.gbifId+doc.zenodoDepId+bib.author+bib.title+bib.pubDate+bib.source+bib.volume+cont.pageCount+cont.treatCount+cont.treatCountDoi+cont.matCitCount+cont.figCountZen+cont.bibRefCount+treat.familyEpithet&groupingFields=doc.articleUuid+doc.doi+doc.gbifId+doc.zenodoDepId+bib.author+bib.title+bib.pubDate+bib.source+bib.volume+cont.pageCount+cont.treatCount+cont.treatCountDoi+cont.matCitCount+cont.figCountZen+cont.bibRefCount+treat.familyEpithet&orderingFields=bib.pubDate&FP-bib.source=Candollea&format=HTML

Thanks for a hint

Cheers Donat