ulb-sachsen-anhalt / ulb-groundtruth-eval-odem-ger

OCR Grountruth ULB VD18 German Fraktur - OCR-D Phase III
https://ulb-sachsen-anhalt.github.io/ulb-groundtruth-eval-odem-ger/
Creative Commons Attribution Share Alike 4.0 International
4 stars 3 forks source link

Fix two issues with textlines in PAGE XML files #2

Closed stweil closed 2 months ago

stweil commented 2 months ago

Both issues were found while extracting line GT for the training of Tesseract.

While fixing those issues manually, I noticed more severe issues in both files:

In the worst case that might be systematic errors which occur often. Which tools were used to produce the buggy output?

M3ssman commented 2 months ago

Thank you very much!