plazi / GoldenGATE-Imagine

A GUI Tool For Freeing Text and Data from PDF Documents
Other
5 stars 0 forks source link

file does not open: too few words per page #19

Open myrmoteras opened 3 years ago

myrmoteras commented 3 years ago

@gsautter here an article that does not open - I think this has been a problem with the Contributions that we had before. It is also part of the eBioDiv project. image GgImagine.20210817-1216.out.zip

the file is here, because it is too large to include here https://www.e-periodica.ch/cntmng?pid=cnh-001%3A2009%3A0%3A%3A1623

Revision of the genus Zodarion Walckenaer, 1833, part III. South East Europe and Turkey (Araneae: Zodariidae) Autor(en): Bosmans, Robert Objekttyp: Article Zeitschrift: Contributions to Natural History : Scientific Papers from the Natural History Museum Bern Band (Jahr): - (2009) Heft 12/1 Persistenter Link: http://doi.org/10.5169/seals-786968

gsautter commented 3 years ago

Looking at the PDF, and marking some words, I tend to think this one is scanned, as the words mark somewhat irregularly ... have you tried opening it as a scan with embedded OCR?

myrmoteras commented 3 years ago

in fact it is scanned. we might get the original though.