plazi / GoldenGATE-Imagine

A GUI Tool For Freeing Text and Data from PDF Documents
Other
5 stars 0 forks source link

GGI display issue: page images do not displayed properly #31

Closed myrmoteras closed 1 year ago

myrmoteras commented 1 year ago

in this case of a BHL file, some pages are obscured, alhthough the undelying text is ok. Is this something that can easily be fixed? If not, then an alternative is to run the files through ABBYY and also resolve some other issues. I will also ask Martin at BHL whether they have journals re-ocred now.

image

Edolius Rangoonensis. Ed. ater viridi splendens; rectricum ex- ternarum scapis longissimis, vexillis late spatulatis ad apicis mar- ginem exteriorem prcsditis. Long. tot. (rectricibus externis exclusis) 1 2 unc.; rostri, \\; ala, 6; caudce, 54; tarsi, 1 Rostrum pedesc^e nigri. Hab. apud Rangoon. Distinguishable from Ed. Malabaricus, to which it is nearly allied, by its shorter beak, and by the total absence from its forehead of the fine curled plumes which decorate that bird; the wing is also somewhat shorter. Edolius Crishna. Ed. velutino-ater viridi metallic^ (preesertim ad alas) splendens; gutturis plumis sublanceolatis, viridibus; capite pilis longissimis pluribus ornato; rectricum externarum vexillis spiraliter intortis. Long. tot. (rectricibus externis exclusis) 12 unc.; rostri, H; al<e, 7; caudce, 6; tarsi, 1 Crishna Crow, Lath., Hist. Hab. in Nepalid. The bill of this species is more cultratcd and lengthened than is usual in the geuus. The outer feathers of the tail, which are spi-

procZSocLondon.36.5-8.pdf

procZSocLondon.36.5-8.pdf.imdir.zip

myrmoteras commented 1 year ago

one more image revuesuissedezoo28schw.449-451.pdf revuesuissedezoo28schw.449-451.pdf.imdir.zip

gsautter commented 1 year ago

Looks like the page image comes up inverted ... digging in.

gsautter commented 1 year ago

Adding inversion awareness for JBIG2 encoded mask images did the trick ... tiny issue with the adjustment of words to the page image left to investigate.

myrmoteras commented 1 year ago

very good - let me know when this is done and the new build up - wannna play with this.. thanks

gsautter commented 1 year ago

Finally figured out and fixed the OCR adjustment problem as well and put out an update.

What was the issue? Well, turns out in none of the original test document for the OCR adjustment were embedded OCR words of a single line so vertically misplaced that they didn't add left to right, but as two lines ... which had the line expand leftwards after there were words associated with it, pulling the (line relative) boundaries of the latter out of place ... added compensation for that now.