ulb-sachsen-anhalt / ulb-groundtruth-eval-odem-ger

OCR Grountruth ULB VD18 German Fraktur - OCR-D Phase III
https://ulb-sachsen-anhalt.github.io/ulb-groundtruth-eval-odem-ger/
Creative Commons Attribution Share Alike 4.0 International
4 stars 3 forks source link

Fix textual inconsistencies between line and words #5

Closed M3ssman closed 1 month ago

M3ssman commented 1 month ago

Description

As already discovered by @stweil (Fix two issues with textlines in PAGE XML files and Find and fix systematic transcription and data issues), there are differences between text on line-level an the corresponding words.

Recently in this data set are about 280 of total 1.000 pages affected.

The situation is a great deal related to correction attempts of Transkribus output concerning geometrical operations like merging or splitting lines and is subject to re-correction.