plazi / GoldenGATE-Imagine

A GUI Tool For Freeing Text and Data from PDF Documents
Other
5 stars 0 forks source link

merging two separated materialsCitation FA4FE02FFFDEFFEAFFB7FF9C4151197E #12

Open myrmoteras opened 3 years ago

myrmoteras commented 3 years ago

@gsautter In this article, the type locality and the holotype is separated by another paragraph and subSubSection. Is there now a way to merge the two? I remember you mentioned something like that

FA4FE02FFFDEFFEAFFB7FF9C4151197E

gsautter commented 3 years ago

I did mention something like that, indeed ... for treatments that are spread out over multiple parts (be it by intermittent treatments of child taxa or because there is additional data in an appendix) ...

While a similar approach should be generally feasible for individual materials citations, things are a good bit more complicated in the latter case: as opposed to treatment annotations, which are groups of whole paragraphs, materials citations exist below the paragraph level, and thus connecting multiple annotations across paragraph boundaries would (at the very least) mean two things:

I don't think either effect is desirable, as faithful representation of the treatments is essential for wider acceptance of both TreatmentBank and BLR as reliable sources of primary published text and data.

gsautter commented 3 years ago

On the other hand, I do see the need for normalizing occurrence data into individual records ... the best option I see here right now is the following:

The whole approach is yet to be implemented, so no use marking the type locality at this point ... this is merely an idea how we could tackle this type of issue ... and how to automate it is yet another question that most likely requires another good deal of thought once we settle on a general approach.