issues
search
plazi
/
arcadia-project
2
stars
1
forks
source link
Guido's To Do List
#192
Open
flsimoes
opened
1 year ago
flsimoes
commented
1 year ago
Current timeline
Core Split - three weeks to finish
materialsCitation - splitting, improving manual splitting based on Extract matCit - two weeks
Fine graining control over the batch
Font Reference
Data re-extraction
XMF
GGX
Treatments, taxon data out of tables - 1 week(ish)
WADM connection - annostor
OCR, JATS, importing connections
Links to XMF, processing of Pensoft matCit
Gran and QC levels tagging/flagging
We first define which levels are allowed to exist
Completely automatic assessment
Tricky to tell GBIF as DwCA doesn't have a way to store this leveling info
flsimoes
commented
1 year ago
Reprocess old data to get to the current status
Cleanup and tag AccessionNumbers
Put COL and BTOL and NCBI taxon identifiers in all taxonomicNames
Cleanup legacy attributes and annotation types and typos (based on the checkup Guido ran during the Sprint)
Run the QC on all IMFs that have not been QCd
obs: Once data is on GBIF, the gatekeeper doesn't interfere anymore
flsimoes
commented
1 year ago
Organize A2Files (Frankfurt server)
By journal and by year (within each journal's folders)