plazi / GoldenGATE-Imagine

A GUI Tool For Freeing Text and Data from PDF Documents
Other
5 stars 0 forks source link

convert table into minitreatments: concept #35

Open myrmoteras opened 1 year ago

myrmoteras commented 1 year ago

Example 1 https://tb.plazi.org/GgServer/tables/DF3752A9F34EFFE1178A4FC66A43FF24?format=XML in FFD8CB4FF353FFFC170B4F6B687FFFC0

Dijkstra, Klaas-Douwe B. & Kalkman, Vincent J. & Dow, Rory A. | Redefining the damselfly families: a comprehensive molecular phylogeny of Zygoptera (Odonata) | 2014

image does not show up directly in GGI

alternatively Example 2

image has the advantage that it shows up in GGI

image

What we need:

gsautter commented 1 year ago
  • each row shows up as individual treatment Should we ask BDJ how they format tables in this respect? Do they have similar tables.

Actually, if multiple subsequent rows have the same taxon name, I fancy bundling the associated materials data into one treatment with multiple materials citations ... but that sure is a detail.

myrmoteras commented 1 year ago

Table examples

myrmoteras commented 1 year ago

Table example

https://tb.plazi.org/GgServer/summary/2F6EFFA8FF9A1571EE5EFF83FFFDFFCA including synonyms

myrmoteras commented 1 year ago

define in <taxonRecordFieldHeader type="matCit.collectionCode">Repository</taxonRecordFieldHeader> the type that can occur in tables.

https://tb.plazi.org/GgServer/tables/DF3752A9F34EFFE1178A4FC66A43FF24?format=XML

myrmoteras commented 1 year ago

table add taxonRecordFieldHeader in the table

import into SRS

the gadgets extracts the taxonRecordTreatments in SRS, which has the type "record", parallel to stubs, treatment

a record treatment then is also exported to GBIF

needs additional tables to make sure the bandwidth of possibliities is covered.

gsautter commented 1 year ago

needs additional tables to make sure the bandwidth of possibliities is covered.

In particular, we need to collect a sufficient number of examples before we can think about automating (a) recognition of tables whose rows are "record" treatments and (b) assigning the specific field type to columns.

Also, we'll need to establish proper QC for table structuring before we can automate this, as otherwise broken records might end up going to GBIF, etc.

gsautter commented 1 year ago

UUIDs of underlying table elements loop through to tables now (see https://tb.plazi.org/GgServer/tables/DF3752A9F34EFFE1178A4FC66A43FF24?format=XML), an essential building block for keeping UUIDs of treatments and materials citations stable across multiple generations, even through the exact UUID generation procedure is yet to be defined.

myrmoteras commented 1 year ago

Additional tables:

file name article UUID table UUID taxonomic name accession code specimen code verbatim specimen linked to phylogeny reference muliplte specimen of he same taxon
RBZ-2021-0007 X X X X
studiesInMycology.99.1.100117-100117 X X X X X X X