plazi / GoldenGATE-Imagine

A GUI Tool For Freeing Text and Data from PDF Documents
Other
5 stars 0 forks source link

article does not decode properly: missing first pages BD00FFCBFFA1FFC1231CC55FFF90FFB9 jAsia-PacificBiodiversity.13.325-330 #10

Open myrmoteras opened 3 years ago

myrmoteras commented 3 years ago

this article using glyphs only does not decode. It is missing the first pages

BD00FFCBFFA1FFC1231CC55FFF90FFB9

image jAsia-PacificBiodiversity.13.325-330.pdf

gsautter commented 3 years ago

Tried both "Render Glyphs Only" and "Decode Unmapped", and this PDF decodes perfectly fine ... not sure which options you used, but but the usual ones for born-digital PDFs seem to both work perfectly fine, both coming up with 6 pages.

myrmoteras commented 3 years ago

i use render glyphs, and not decode and both did not work.

Did you upload the file?

gsautter commented 3 years ago

Do you happen to still have any log files? Hard to tell what might have happened otherwise ...

Uploaded the IMF with all the pages now.

myrmoteras commented 3 years ago

I can reproduce the missing pages. Here the log GgImagine.20201227-1354.out.zip

myrmoteras commented 3 years ago

can you please release the IMF - I can't open it, as it is locked by the admin

gsautter commented 3 years ago

It's released now, not sure what went wrong yesterday.

gsautter commented 3 years ago

The only potential reason for the two missing pages is that some template got assigned that indicates 2 cover pages ... all 6 pages are generated properly, then the PDF decoder selects template, sorts out any cover pages the latter indicates, and moves on to analyze the content of the remaining pages ... not sure which template could have possibly gotten in the way here, though ...

A bit of digging shows Geodiversitas.2018-.journal_article as the culprit ... looks as though its anchors require some refinement.