plazi / GoldenGATE-Imagine

A GUI Tool For Freeing Text and Data from PDF Documents
Other
5 stars 0 forks source link

males / female symbols do not decipher properly #34

Open myrmoteras opened 1 year ago

myrmoteras commented 1 year ago

in the annals Natal Museum in the new version of GGI the male and female symbol do not display properly. londt_1982b.pdf

image as text cJ

This is an issue of the OCR of the original PDFs cJ. @gsautter do you see a way to fix this on the GGI side in a an automated way?

gsautter commented 1 year ago

This is indeed due to errors in the OCR that comes with the PDF, accurate as it is otherwise ... no real way of fixing this in an automated way (which would basically require a really good and well trained OCR engine), but QC should be able to capture OCR conflicts now, and the cluster based correction facilities enable users to fix this with far less effort that it would be to find an fix all the individual instances of the symbol ...